> According to our sources and seconded by an ASUS statement to Der8auer, the problem stems from SoC voltages being altered to unsafe higher levels. This can be imposed from either the pre-programmed voltages used to support EXPO memory overclocking profiles or when a user manually adjusts the SoC voltages (a common practice to eke out a bit more memory overclocking headroom).
Does that mean, as long as I don't overclock, my Ryzen7 CPU will most likely be a-ok?
In general, you should be running the EXPO profile: this "upclocks" your RAM. I use the term "upclock" instead of overclock because you aren't pushing your RAM past its manufacturer's limit, you are pushing it to that limit. The JEDEC profile (the opposite of EXPO) is extremely conservative and can significantly affect performance.
This effect is compounded with Ryzen because of Infinity Fabric, which is the interconnect between the chiplets in the CPU. IF is clocked to the same speed as your RAM - so at the 6000MT/s "sweet spot" the article mentions your IF and RAM bus are running at 3000MHz. Matching IF to RAM bus is so important that increasing the RAM bus clock above the IF clock will decrease performance (unless the RAM bus clock is a multiple of the IF clock).
I haven't done the comparison on my 7950x, but on my 3950x running the IF at 1800MHz (RAM at 3600MT/s) resulted in a 10%-30% uplift in performance (depending on the workload).
So with Ryzen, if should consider purchasing RAM that is matched to the IF clock. For Ryzen 7 that is 6000MT/s. If you purchased RAM above that, you should definitely decrease the clock to match IF (you can tighten RAM timing to take advantage of the more expensive chips that you purchased, if you have the patience to).
What am I doing? I am going to disable EXPO, but I will definitely be re-enabling once this has been resolved.
Aside: What is MT/s vs MHz? DDR stands for "double data rate", meaning the MHz that everyone uses to describe the speed of RAM is a misnomer. "6000MHz" RAM actually runs at 3000MHz (but does two transfers per clock cycle). So: Mega-Transfers per Second.
>In general, you should be running the EXPO profile: this "upclocks" your RAM. I use the term "upclock" instead of overclock because you aren't pushing your RAM past its manufacturer's limit, you are pushing it to that limit. The JEDEC profile (the opposite of EXPO) is extremely conservative and can significantly affect performance.
Any source I could find says that EXPO does void warranty, as it is overclocking. https://www.amd.com/en/technologies/expo (in footnotes section at the end of page)
It might move RAM to it's manufacturer spec as you said but it does it by overclocking CPU
> So with Ryzen, if should consider purchasing RAM that is matched to the IF clock. For Ryzen 7 that is 6000MT/s. If you purchased RAM above that, you should definitely decrease the clock to match IF (you can tighten RAM timing to take advantage of the more expensive chips that you purchased, if you have the patience to).
Do you know any good benchmark (preferably Linux one) that would show those differences? I will be getting 7800X3D in few days so I want to test it for a bit.
> What am I doing? I am going to disable EXPO, but I will definitely be re-enabling once this has been resolved.
The core of the issue seems to be "just" overvolting so I'm guessing any manual overclock would still be fine. I have no need for it now (just puny 1070 to pair it with for now), but yeah, wait for new BIOS firmware seems to be best option.
> Aside: What is MT/s vs MHz? DDR stands for "double data rate", meaning the MHz that everyone uses to describe the speed of RAM is a misnomer. "6000MHz" RAM actually runs at 3000MHz (but does two transfers per clock cycle). So: Mega-Transfers per Second.
Not exactly. The specified speed is only speed of the bus between memory and CPU, not the speed of memory chip itself. DDR4 3200 and DDR5 6400 have exact same internal speed (400MHz) and this is why there isn't much of latency improvement between generations (DDR5 does have some other tricks for latency).
Also the 6400MHz is actual speed data is put on the bus, even if clock is half of that.
It's pretty much entirely so the RAM chips don't have some insane number of pins but something manageable.
> Any source I could find says that EXPO does void warranty, as it is overclocking. https://www.amd.com/en/technologies/expo (in footnotes section at the end of page)
> Overclocking and/or undervolting AMD processors and memory, including without limitation, altering clock frequencies / multipliers or memory timing / voltage, to operate outside of AMD’s published specifications will void any applicable AMD product warranty, even when enabled via AMD hardware and/or software. This may also void warranties offered by the system manufacturer or retailer. Users assume all risks and liabilities that may arise out of overclocking and/or undervolting AMD processors, including, without limitation, failure of or damage to hardware, reduced system performance and/or data loss, corruption or vulnerability. GD-106
I am not a lawyer! BUT!
AMD voiding the warranty like that sounds extremely illegal.(Depends on your jurisdiction, obviously.) EXPO is very clearly advertised on the box, documentation, and media coverage. It even comes on by default in a number of cases. That would mean the AMD Processor/Memory would have the warranty voided when you first power it on!
The EU specifically has a law covering that any advertised features are covered by a legally mandated 2 year warranty.
Well, it is an advertised feature that explicitly states (althought under hidden footnote) that it breaks warranty so yeah, would be nice if someone tried to get them to court over it. I'd imagine they would be raked thru coals in EU
> It even comes on by default in a number of cases. That would mean the AMD Processor/Memory would have the warranty voided when you first power it on!
That I'd be doubtful about as AMD have no bearing about what options random motherboard vendor enables by default
I was still using Windows when I did everything with my 3950x. I used AIDA64 for the memory, specifically paying attention to latency (which is the strongest signal you will see for IF 1:1). Other than that it was a few tests that matched my anticipated workload: pi for stability, 3DMark for games (another decent signal), compilation speed for dev.
On Linux+7950x I just did some subjective tests (I have less time these days) with rust-analyzer responsiveness and there definitely was an improvement, but I can't quantify how much.
I might be less of an issue for a 7800 because, as far as I remember, it doesn't use chiplets.
IF maxes around ~2GHz on current chips so it is impossible to run it 1:1 on anything faster than DDR5-4000, but apparently something was improved along the way, and optimal RAM speed is at mentioned 6000 and not matching doesn't hurt as much in Zen 4
it appears that even with mismatched clock (the 2166 IB with 6000MHz memory) latency continues to drop, and problems start if you try to run memory controller above that
My reading of the statement (and understanding of other reporting on this subject) is that EXPO alone can trigger this issue. Calling EXPO overclocking feels somewhat disingenuous to me.
EXPO is akin to Intel's XMP. The RAM stick reports timings and other settings that it can handle and then the motherboard/user selects one to use. The profiles are needed because the official JEDEC standard for DDR doesn't provide a mechanism for running up to the frequencies that modern RAM uses; e.g. if you buy a DDR5 6000MHz kit, it'll only run at 4800MHz or thereabouts until you enable EXPO/XMP. RAM is really never expected to be run at the base frequency (without EXPO/XMP). If you buy a prebuilt, it'll have EXPO/XMP enabled, and if you build yourself you always should be enabling it. If you have a RAM kit that's somewhat recent and don't have it enabled, you've likely wasted money on the RAM.
Intel has attempted to claim that XMP was overclocking and violated their warranty but my understanding is that they quickly backed off.
> My reading of the statement (and understanding of other reporting on this subject) is that EXPO alone can trigger this issue. Calling EXPO overclocking feels somewhat disingenuous to me.
Totally. With DDR5-6000, it's even nastier: AMD repeatedly said that DDR5-6000 was the "sweet spot". I never overclocked any computer of mine but I did buy DDR5-6000 for my 7700X because AMD said it was the sweet spot. And I turned EXPO on so that it'd run at 6000, not 4800.
And now, after saying 6000 was the "sweet spot", they try to word things as if EXPO was somehow overclocking!?
I'm in same boat. I payed a premium for DDR5-6000 64GB for my 7950X, because AMD said it's the sweet spot... I didn't ever consider this "overclocking", I haven't ever been interested on overclocking, I just want stable machine.
Now I'm considering disabling EXPO, but I don't think it would work as 6000 anymore after that. I have used this machine for 4 months now, so I'd say it's pretty safe, but who knows, I haven't taxed the CPU with gaming yet.
> Overclocking and/or undervolting AMD processors and memory, including without limitation, altering clock frequencies / multipliers or memory timing / voltage, to operate outside of AMD’s published specifications will void any applicable AMD product warranty, even when enabled via AMD hardware and/or software. This may also void warranties offered by the system manufacturer or retailer. Users assume all risks and liabilities that may arise out of overclocking and/or undervolting AMD processors, including, without limitation, failure of or damage to hardware, reduced system performance and/or data loss, corruption or vulnerability.
On their own page they consider it warranty-voiding feature that they ADVERTISE and PROMOTE
> Now I'm considering disabling EXPO, but I don't think it would work as 6000 anymore after that. I have used this machine for 4 months now, so I'd say it's pretty safe, but who knows, I haven't taxed the CPU with gaming yet.
I'd drop it, wait till it all blow over, upgrade the BIOS, and re-enable it. It does seem like just a firmware bug or maybe some OC profile on memory being overly optimistic with voltage.
That's not how it is presented in the BIOS. EXPO settings aren't under overclocking, to me they didn't appear scary overclocking thing.
In BIOS there is an separate section "Overclocking" and to get there you have to "Accept" or "Decline" some big text about warranty, I never went there to enable EXPO. So I never needed to press "Accept" to void my warranty to toggle on EXPO...
EXPO settings just to my eye looks like this:
EXPO II turned on, memory frequency 6000 MHz (what is says on the memory), and CPU frequence stays where it was (thus not overclocked).
> EXPO II turned on, memory frequency 6000 MHz (what is says on the memory), and CPU frequence stays where it was (thus not overclocked).
It's a technology to overclock memory controller on CPU, not CPU cores themselves. Increase of voltage of memory also needs to be applied to the controller, hence where the problem in article comes from.
But yeah, I can see someone arguing that in court and winning, with AMD only warning about warranty loss in footnotes and mobo vendors not presenting it as such either.
This kind of corporate behavior is common. Subaru for a while offered free SCCA memberships with purchase of a WRX so you could autocross it, then you do an autocross and they try to get out of any warranty work because you autocrossed it.
Certainly tire/brake pad wear shouldn't be covered but they were more ridiculous about it than that.
Be sure you know what autocross is before you type up a big lecture about wear and tear from racing. Also I already know.
My Chevy came with instructions on proper track prep and tracking it (if prepped as instructed) won't void the warranty. Didn't include free track days though.
Autocrossing is more my thing and I'm shocked any car manufacturer would try to weasel out of warranty for it. I've seen modified cars break and even catch on fire, and can understand racing tires could put undue stress on stock suspension, but a stock car that can't handle autocross? Should we expect parts to be falling off if you need to do an emergency brake or evasive maneuver on the road?
>if you build yourself you always should be enabling it.
Absolutely not. This is such a terrible advice I would think you were trolling if I didn't assume better of you.
An end-user should not engage in any overclocking, XMP/EXPO included, without thorough knowledge and understanding of what they're getting into. You are literally driving your hardware above and beyond official, published specifications.
And no, marketing is not "official, published specifications".
> You are literally driving your hardware above and beyond official, published specifications.
That is not the case if we talk RAM. It is sold as 6000MHz, it is tested as 6000MHz and it is validated to run at 6000MHz. And there is a standard each developed by both Intel and AMD to allow RAM to ran at it's rated speed.
It isn't just marketing. The problem is, that this standard entails going out of spec on the CPU side, where it does mean going beyond validated limits. But the consumer shouldn't have to dig to understand, that these standard designed to make your RAM run at it's rated speed technically means pushing limits.
I think this is a bit of a stretch from tomshardware, as if you actually read the statement https://i.imgur.com/pM358j6.jpg it doesn't actually say what the problem is, or even that there is a problem. Just they released a new BIOS that has more limitations.
And the other "evidence" of a photo of a damaged pin pad on a ryzen processor shows the damaged pins aren't related to the SoC rail mentioned in the article.
The only thing I can see in the article that supports that is:
"Our sources also added further details about the nature of the chip failures — in some cases, excessive SoC voltages destroy the chips' thermal sensors and thermal protection mechanisms, completely disabling its only means of detecting and protecting itself from overheating"
And "Our Sources" aren't motherboard vendor statements, none of the statements from motherboard vendors or AMD themselves support this theory, so we're relying on Tom's unnamed "sources" rather than any public evidence or statement.
This is exactly what I dislike - the slow removal of the source and quality of evidence and information. It's Toms being bad too, as their statement
"seconded by an ASUS statement to Der8auer, the problem stems from SoC voltages"
does not seem a good representation of the actual statement - the image of which I linked.
ASUS pointing out that they are defining new rules for vCore and SoC voltage to address these issues is in fact pointing to that as the issue. They aren't just "throwing that in there" because it is unrelated. If you need it to be even more direct, here is the statement from ASUS that says it is SoC voltage.
You've admitted that you work for AMD, which would make a casual observer think that you are being biased in your comments. There is verifiable proof of the correlation with SoC voltage from the ODMs themselves.
Vendors are putting out updated BIOS that limit voltage going to the chips. Based on internet hearsay, even some of the non-OC chips were getting wrecked by stock BIOS overvolting enough. The real concern is with the X3D chips which are particularly sensitive to voltage and not particularly meant to be overclocked.
I think you're fine as long as you flash an updated BIOS put out by your MB manufacturer.
I guess that depends on the nature of the failure and the fragility of the part.
If you're staying inside the recommended voltage its likely the product will have a long life. For example look at the life curve of capacitors (and I'm not talking the diseased ones we had to deal with in the 2000s). The 'hotter' we run them the shorter their lifespan is. Now what can be difficult to figure out via testing is to figure out the MTBF when you have a hard cutoff point for failure that's not very much higher than standard operating specs. We could be failing at 100 hours at 1.4v, and 10,000 hours at 1.3v. Or it could last 100,000 hours at 1.3v.
What I'm not sure of is DDR5 - it had seemed DDR5 offered more flexibility in voltage/speeds, i.e. the "stock" ddr5 speeds at 1.2V seemed conservative compared to 1.35 or 1.5V for DDR3 or DDR4. But if going about 1.2V or whatever voltage DDR5-6000 EXPO profiles are setting, also necessitates increasing SOC/CPU voltage...maybe it's better to hang back and just stick w/ DDR5-5200 or 5400 or something like that.
That’s the root of the problem seemingly - going past the official spec (5200 MT/s for 2 sticks, 3600 MT/s for 4 sticks) requires additional voltage to the memory controller and it’s not uncommon for motherboards to automatically punch up the voltages when you turn on XMP or Expo. You requested the board to overclock after all, they’re just helping you get it stable without tinkering! It’s completely legal from the mobo vendor’s perspective to enable more voltage in “auto-OC” scenarios and they are in fact incentivized to do so by the possibility of returns and negative reviews/bad word of mouth. Customers will remember when “it didn’t work on Asus but I just enabled XMP on MSI and it worked fine”, even if that’s because MSI is punching up voltages. They will see that MSI scores 3% better in whatever benchmark. And AMD themselves in fact advertises benchmarks with Expo enabled, despite the fact that it officially voids the warranty. AMD and Intel (both) have gotten away with this for a long time, having their cake on heavily implying/suggesting it be enabled but technically voiding the warranty if you do (although it's quite unenforced unless you openly tell a warranty agent that you did it).
I had a 9900K fail due to simply enabling XMP too, and I’ve seen many reports of early Zen1/Zen+/Zen2 with what seems likely to be IMC failures (they are getting to be "of the age" - 3-5 years with increased VSOC and the memory controllers are just clapped out). DDR4 controllers seem more delicate in general compared to DDR3, and DDR5 requires still higher voltages. It’s almost shocking to hear buildzoid recommend 1.3v VSOC as a “safe” daily driver. AM4 voltages were more like 0.8-1.1v.
It also doesn’t help that the early AM5 IO die seems to be terrible. AMD's memory controllers always been worse than Intel even on AM4, but a lot of AM5 chips won’t even post with 4 sticks even at the pedestrian official spec of 3600 MT/s (intel is 3600 MT/s too but they actually work mostly reliably at it, it's a lottery with Zen4). Zen5 is supposed to feature an improved IO die and I wonder if the VSOC will be lower there too.
I'll leave my little anecdata with Intel Alder Lake memory controller shenanigans since it's relevant.
I built an i7 12700K system last year with 4x16GB DIMMs of DDR5-5600 RAM, for a total of 64GB RAM. Not being fully aware of the memory controller's limitations at that point (moral: read the datasheets!) I turned on XMP which obviously overclocked the RAM to 5600MHz. Note that at this point I was curious why the mobo had booted the system with RAM clocked to 4000MHz prior to my fiddling around with the BIOS.
Afterwards, I had instability which manifested in random program and system crashes. Eventually I checked the RAM and confirmed that was the issue with memtest86 returning errors. I read up on the CPU's datasheet and learned that when supplied with 2 DIMMs per channel with 1 rank per DIMM, the memory controller downclocks to 4000MHz as specified. Incidentally, this answered why the mobo booted the system up with RAM clocked to 4000MHz at the beginning.
I turned off XMP and backed off the overclock to 4800MHz with timings as applied by the mobo (they're just JEDEC profile timings), and that got me back to stability with subsequent tests in memtest86 returning no errors.
The important things to takeaway here are that:
* DDR5 memory controllers are still in their infancy compared to DDR4 and DDR3 controllers today, and can't yet deal with so much RAM at once.
* Going beyond 1 DIMM per channel, 1 rank per DIMM means the memory controller will run slower than advertised, and this is according to specification. It is on us the end-users to do our homework, we can't forget we are using precision electronics.
* With the memory controller running slower, overclocks are that much more taxing than XMP/EXPO and the marketing would imply. Simply turning on XMP/EXPO without thought is dangerous because those profiles do not account for a slower memory controller.
* Overclocking is by its very definition running hardware above and beyond what they were designed and rated to operate at. Warranties will be voided, and hardware can be damaged or destroyed because they are operated out of specification. You have been warned, ignorance is not an excuse.
* Between CPUs (and GPUs) today coming "over"clocked from the factory to perform at their peak and RAM speed having negligible real world performance implications for the vast majority of use cases, end-user overclocking is simply not worth the hassle unless you absolutely know what you're doing.
* EFI/BIOS firmware nearly always come out of the box with safe defaults. If you don't know what you're doing, leaving them alone is perfectly fine. You will still have a very performant system. Overclocking is not for the faint of heart (nor the ignorant!).
"Probably", but not guaranteed. This isn't a normal thing to have happen, even with overclocked chips. The temperature sensors used for thermal protection aren't usually the first thing to break.
It might not be temperature sensor itself but circuitry measuring it. Yeah sensor is "just a diode" but that needs amplifier and ADC to get to the rest of the system
Does that mean, as long as I don't overclock, my Ryzen7 CPU will most likely be a-ok?