Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My understanding is that it's screwed up for multiple vendors and chipsets. The boards might say they support it, but there are some updates saying it's not. It seemed extremely hard to find any that actually supported it. It was actually easier to find new Intel boards supporting ECC.


yeah wendell put out a video a few weeks ago exploring a bunch of problems with asrock rack-branded server-market B650 motherboards and basically the ECC situation was exactly what everyone warns about: the various BIOS versions wandered between "works, but doesn't forward the errors", "doesn't work, and doesn't forward the errors", and (excitingly) "doesn't work and doesn't even post". We are a year and a half after zen4 launched and there barely are any server-branded boards to begin with, and even those boards don't work right.

https://youtu.be/RdYToqy05pI?t=503

I don't know how many times it has to be said but "doesn't explicitly disable" is not the same thing as "support". There are lots of other enablement steps that are required to get ECC to work properly, and they really need to be explicitly tested with each release (which if it is "not explicitly disabled", it's not getting tested). Support means you can complain to someone when it doesn't work right.

AMD churns AGESA really, really hard and it breaks all the time. Partners have to try and chase the upstream and sometimes it works and sometimes it doesn't. Elmor (Asus's Bios Guy) talked about this on Overclock.net back around 2017-2018 when AMD was launching X399 and talked about some of the troubles there and with AM4.

That said, the current situation has seemingly lit a fire under the board partners, with Intel out of commission and all these customers desperate for an alternative to their W680/raptor lake systems (which do support ecc officially, btw) in these performance-sensitive niches or power-limited datacenter layouts, they are finally cleaning up the mess like, within the last 3 weeks or so. They've very quickly gone from not caring about these boards to seeing a big market opportunity.

https://www.youtube.com/watch?v=n1tXJ8HZcj4

can't believe how many times I've explained in the last month that yes, people do actually run 13700Ks in the datacenter... with ECC... and actually it's probably some pretty big names in fact. A previous video dropped the tidbit that one of the major affected customers is Citadel Capital - and yeah, those are the guys who used to get special EVEREST and BLACK OPS skus from intel for the same thing. Client platform is better at that, the very best sapphire rapids or epyc -F or -X3D sku is going to be like 75% of the performance at best. It's also the fastest thing available for serving NVMe flash storage (and Intel specifically targeted this, the Xeon E-2400 series with the C266 chipset can talk NVMe SAS natively on its chipset with up to 4 slimsas ports...)

it's somewhere in this one I think: https://www.youtube.com/watch?v=5KHCLBqRrnY


The new EPYC processors for AM5 though look like they'll be ok for ECC ram though, at least in the coming months onwards.


Yeah I think that’s the bright spot, now that there’s a branded offering for server-flavored Ryzen now maybe there is a permanent justification for doing proper validation.

I just feel vindicated lol, it always comes up that “well works fine for me!” and the reality is it’s a total crapshoot with even server-branded boards often not working. There is zero chance your gigabyte UD3 or whatever is going to be consistently supported across bios and often it will not be.

And AMD is really really tied to AGESA releases, so it’s fairly important on that side. Although I guess maybe we’re seeing now what happens if you let too much be abstracted away… but on the other hand partners were blowing up AMD chips last year too.

If you’re comfortable always testing, and always having the possibility of there being some big AGESA problem and ecc being broken on the new versions… ok I guess.

There is a reason the i3 chips were perennial favorites for edge servers and NASs. And I think it's really, really hard to overstate the long-term damage from reputation loss here. Intel, meltdown aside, was always no-drama in terms of reliability. Other than C2000/C3000, I guess.


...and puma and i-225V chipsets.

or at least... maybe on the CPU side they were no-drama. Other than C2000/C3000. Granted the powervr graphics on the atoms way back did suck... and meltdown... and avx-512 being rolled back... /phillip j fry counting on his fingers

maybe "blue-chip coded" is a better way to express it ig

but like, there is a notable decline in the quality of execution of intel overall, pretty much across the board, and cpu was always their core vertical, right? That was their business redoubt. intel is blue chip chips, especially CPUs. And now it's falling - really it's been falling for a while. Meltdown I can generally excuse (yes, shush), nobody appreciated sidechannels back then even if they were theoretically known. C2000/C3000 is another fuckup. yeah it's the super-io/serial bus controller... technically not their IP but it happens to be in a critical path, on their node, killing their processor. They fucked up the validation there, evidently.

I-225V had three steppings and I-226V is still not fully fixed (windows/linux have just turned off the EEE/802.11az feature instead). Puma was a god damned mess.

Sapphire rapids was late, still a huge mess, and actually the -W platform had not only insane power draw, but also insaner transients. 750W average, spiking up to 1500W under load, with pretty steep holdup requirements. And actually that was locked behind a "water cooled" bios option, the processor just "refused to all-core turbo" otherwise. And Intel didn't wanna actually say that the "water cooled" behavior was the spec or intentional turbo limits etc. In hindsight hmmm, that all took a bit of a different tone, didn't it?

Supposedly there is going to be a SPR-W refresh with a new stepping to fix this... emerald rapids is also very power-hungry and there were some unconfirmed murmurs suggesting it might have the same crash problems.

(yes, yes, please just listen to the guest here.) https://www.youtube.com/watch?v=_HJu5xt43iQ&t=3603s

https://wccftech.com/intel-xeon-w-3500-w-2500-sapphire-rapid...

Intel's in some real danger especially with AMD ascendant like this. Like it doesn't take very long of this real damage to customers etc and that "we're blue-chip!" thing will cease to be, and that is the last prop keeping intel's finances above the water here. Sure, it will take a while to fully wind down but... this is a great example of how intel's fuckups are driving their clients literally into the arms of the competition. A month or two ago, Asrock Rack didn't give a shit about the B650-2L2T or whatever. Guess what? Now Epyc Mini exists and oems are going to be paying attention to that. Oops.


> I-226V is still not fully fixed

Damn, didn't realise that was still being problematic too. :(

And yeah, Intel's current stumble with 13th/14th gen cpus seems like worst possible timing for such an extreme fuck up. That's not going to go well for future planning/purchase decisions by business customers.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: