Support this, reimplement that, support upstream efforts, dont really care. Any of those would cost a couple of million and be worth a trillion dollars to AMD shareholders.
Is it weird how the comments here are blaming AMD and not Nvidia? Sure, the obvious argument is that Nvidia has no practical motivation to build an open platform. But there are counterexamples that suggest otherwise (Android). And there is a compelling argument that long term, their proprietary firmware layer will become an insufficient moat to their hardware dominance.
Who’s the root cause? The company with the dominant platform that refuses to open it up, or the competitor who can’t catch up because they’re running so far behind? Even if AMD made their own version of CUDA that was better in every way, it still wouldn’t gain adoption because CUDA has become the standard. No matter what they do, they’ll need to have a compatibility layer. And in that case maybe it makes sense for them to invest in the best one that emerges from the community.
> Is it weird how the comments here are blaming AMD and not Nvidia?
Nvidia has put in the legwork and are reaping the rewards. They've worked closely with the people who are actually using their stuff, funding development and giving loads of support to researchers, teachers and so on, for probably a decade now. Why should they give all that away?
> But there are counterexamples that suggest otherwise (Android).
How is Android a counterexample? Google makes no money off of it, nor does anyone else. Google keeps Android open so that Apple can't move everyone onto their ad platform, so it's worth it for them as a strategic move, but Nvidia has no such motive.
> Even if AMD made their own version of CUDA that was better in every way, it still wouldn’t gain adoption because CUDA has become the standard.
Maybe. But again, that's because NVidia has been putting in the work to make something better for a decade or more. The best time for AMD to start actually trying was 10 years ago; the second-best time is today.
> Google makes no money off of it, nor does anyone else
Google makes no money off of Android? That seems like a really weird claim to make. Do you really think Google would be anywhere near as valuable of a company if iOS had all of the market share that the data vacuum that is Android has? I can't imagine that being the case.
Google makes a boatload off of Android, just like AMD would if they supported open GPGPU efforts aggressively.
Android is a complement to Google's business, which is when open source works. What would be the complement worth $1 Trillion to NVIDIA to build a truly open platform? There isn't one. That was his point.
There’s an entire derivative industry of GPUs, namely GenAI and LLM providers, that could be the “complement” to an open GPU platform. The exact design and interface between such a complement and platform is yet undefined, but I’m sure there are creative approaches to this problem.
And NVIDIA is playing in that game too. Why would they not play in higher level services as well? They already publish the source to their entire software stack. A comparison to Android is completely useless. Google is a multi-sided platform that does lots of things for free for some people (web users, Android users) so it can charge other people for their data (ad buyers). That isn't the chip business whatsoever. The original comment only makes sense if you know nothing about their respective business models.
Yes, so when the ground inevitably shifts below their feet (it might happen years from now, but it will happen – open platforms always emerge and eventually proliferate), wouldn’t it be better for them to own that platform?
On the other hand, they could always wait for the most viable threat to emerge and then pay a few billion dollars to acquire it and own its direction. Google didn’t invent Android, after all…
> Google is a multi-sided platform that does lots of things for free for some people… That isn't the chip business whatsoever.
This is a reductionist differentiation that overlooks the similarities between the platforms of “mobile” and “GPU” (and also mischaracterizes the business model of Google, who does in fact make money directly from Android sales, and even moved all the way down the stack to selling hardware). In fact there is even a potentially direct analogy between the two platforms: LLM is the top of the stack with GPU on the bottom, just like Advertising is the top of the stack with Mobile on the bottom.
Yes, Google’s top level money printer is advertising, and everything they do (including Android) is about controlling the maximum number of layers below that money printer. But that doesn’t mean there is no benefit to Nvidia doing the same. They might approach it differently, since they currently own the bottom layer whereas Google started from the top layer. But the end result of controlling the whole stack will lead to the same benefits.
And you even admit in your comment that Nvidia is investing in these higher levels. My argument is that they are jeopardizing the longevity of these high-level investments due to their reluctance to invest in an open platform at the bottom layer (not even the bottom, but one level above their hardware). This will leave them vulnerable to encroachment by a player that comes from a higher level, like OpenAI for example, who gets to define the open platform before Nvidia ever has a chance to own it.
> it might happen years from now, but it will happen – open platforms always emerge and eventually proliferate
30 years ago people were making the same argument that MS should have kept DirectX open or else they were going to lose to OpenGL. Look how that's worked out for them.
> Google, who does in fact make money directly from Android sales
They don't though. They have some amount of revenue from it, but it's a loss-making operation.
> In fact there is even a potentially direct analogy between the two platforms: LLM is the top of the stack with GPU on the bottom, just like Advertising is the top of the stack with Mobile on the bottom.
But which layer is the differentiator, and which layer is just commodity? Google gives away Android because it isn't better than iOS and isn't trying to be; "good enough" is fine for their business (if anything, being open is a way to stay relevant where they would otherwise fall behind). They don't give away the ad-tech, nor would they open up e.g. Maps data where they have a competitive advantage.
NVidia has no reason to open up CUDA; they have nothing to gain and a lot to lose by doing so. They make a lot of their money from hardware sales which they would open up to cannibalisation, and CUDA is already the industry standard that everyone builds on and stays compatible with. If there was ever a real competitive threat then that might change, but AMD has a long way to go to get there.
"Open up CUDA" - guys, its all open source. What do you want them to do? Do tech support to help their competitors compete against them? AMD is to blame for not building this project 10 years ago.
Google gave away the software platform - Android - to hardware vendors for free, vendors compete making the hardware into cheap, low-margin commodity items, and google makes boatloads of money from ads, tracking and the app store.
nvidia could give away the software platform - CUDA - to hardware vendors for free, making the hardware into cheap, low-margin commodity items. But how would they make boatloads of money when there's nowhere to put ads, tracking or an app store?
>Is it weird how the comments here are blaming AMD and not Nvidia?
It's not. Even as it is, I do not trust HIP or RocM to be a viable alternative to Cuda. George Hotz did plenty of work trying to port various ML architectures to AMD and was met with countless driver bugs. The problem isn't nvidia won't build an open platform - the problem is AMD won't invest in a competitive platform. 99% of ML engineers do not write CUDA. For the vast majority of workloads, there are probably 20 engineers at Meta who write the Cuda backend for Pytorch that every other engineer uses. Meta could hire another 20 engineers to support whatever AMD has (they did, and it's not as robust as CUDA).
Even if CUDA was open - do you expect nvidia to also write drivers for AMD? I don't believe 3rd parties will get anywhere writing "compatibility layers" because AMD's own GPU aren't optimized or tested for CUDA-like workloads.
Khrons, AMD and Intel have had 15 years to make something out of OpenCL that could rival CUDA.
Instead they managed 15 years of disappointment, in a standard stuck in C99, that adopted C++ and a polyglot bytecode too late to matter, never produced an ecosystem of IDE tooling and GPU libraries.
Naturally CUDA became the standard, when NVIDIA provided what the GPU community cared about.
> Is it weird how the comments here are blaming AMD and not Nvidia?
Not even a little bit. It simply isn't Nvidia's job to provide competitive alternatives to Nvidia. Competing is something AMD must take responsibility for.
The only reason CUDA is such a big talking point is because AMD tripped over their own feet supporting accelerated BLAS on AMD GPUs. Realistically it probably is hard to implement (AMD have a lot of competent people on staff) but Nvidia hasn't done anything unfair apart from execute so well that they make all the alternatives look bad.
Huh? Why the sarcasm? You think it's a good thing that someone besides the person who owns the hardware has the final say on what the hardware is allowed to be used for?
That's not actually a thing? I specifically moved away from Nvidia because
1) they choose (chose?) not to supprt standard display protocols that Wayland compositors target with their drivers (annoying, but not the end of the world)
2) they cryptographically lock users out of writing their own drivers for their own graphics cards (which should be illegal and is exactly contradictory to "that's not actually a thing").
Again: look into why the Nouveau driver performance is limited.
This seems to be more about certain devices (consumer-grade GPUs) in certain settings (data centers), though I do question how enforceable it actually is. My guess is that it can only apply when you try to get discounts from bulk-ordering GPUs.
Also, was there any followup to this story? It seems a bit unnecessary because nVidia has already neutered consumer cards for many/most data center purposes by not using ECC and by providing so few FP64 units that double precision FLOPS is barely better than CPU SIMD.
it’s also not really a thing anymore because of the open kernel driver… at that point it’s just MIT licensed.
of course people continued to melt down about that for some reason too, in the customary “nothing is ever libre enough!” circular firing squad. Just like streamline etc.
There’s a really shitty strain of fanboy thought that wants libre software to be actively worsened (even stonewalled by the kernel team if necessary) so that they can continue to argue against nvidia as a bad actor that doesn’t play nicely with open source. You saw it with all these things but especially with the open kernel driver, people were really happy it didn’t get upstreamed. Shitty behavior all around.
You see it every time someone quotes Linus Torvalds on the issue. Some slight from 2006 is more important than users having good, open drivers upstreamed. Some petty brand preferences are legitimately far important than working with and bringing that vendor into the fold long-term, for a large number of people. Most of whom don’t even consider themselves fanboys! They just say all the things a fanboy would say, and act all the ways a fanboy would act…
>Is it weird how the comments here are blaming AMD and not Nvidia?
Because it IS AMD/Apple/etcs fault for the position they're in right now. CUDA showed where the world was heading and where the gains in compute would be made well over a decade ago now.
They even had OpenCL, didn't put the right amount of effort into it, all the talent found CUDA easier to work with so built there. Then what did AMD, Apple do? Double down and try and make something better and compete? Nah they fragmented and went their own way, AMD with what feels like a fraction of the effort even Apple put in.
From the actions of the other teams in the game it's not hard to imagine a world without CUDA being a world where this tech is running at a fraction of it's potential.
It's always been on the straggler to catch up by cheating. That's just how the world works - even in open source. If AMD supported CUDA, it would have a bigger market share. That's a fact. Nvidia doesn't want that. That's a fact. But when Reddit started, it just scraped feeds from Digg, and when Facebook started, it let you link your MySpace credentials and scraped your MySpace account. Adversarial interoperability is nothing new.
Funnily, who I blame the most for there not being real competition to CUDA is apple. As of late, Apple has been really pushing for vender lock in APIs rather than adopting open standards. The end result is you can get AMD and Intel onboard with some standard which is ultimately torpedoed by apple. (See apple departing from and rejecting everything that comes from the khronos group).
With the number of devs that use Apple silicon now-a-days, I have to think that their support for khronos initiatives like SYCL and OpenCL would have significantly accelerated progress and adoption in both.
We need an open standard that isn't just AMD specific to be successful in toppling CUDA.
Pretty much any modern NVIDIA GPU supports CUDA. You don't have to buy a datacenter-class unit to get your feet wet with CUDA programming. ROCm will count as "something" when the same is true for AMD GPUs.
ROCm supports current gen consumer gpus officially and a decent chunk of recent gen consumer gpus unofficially. Not all of them of course but a decent chunk.
It's not ideal but I'm pretty sure CUDA didn't support everything from day 1. And ROCm is part of AMD's vendor part of the Windows AI stack so from upcoming gen on out basically anything that outputs video should support ROCm.
No, but CUDA at least supported the 8800 gt on release [1]. ROCm didn't support any consumer cards on release, looks like they didn't support any till last year? [2]
I don't think AMD needs to support 5+ year old GPUs personally. And all the recent generations are already practically supported.
AMD only claims support for a select few GPUs, but in my testing I find all the GPUs work fine if the architecture is supported. I've tested rx6600, rx6700xt for example and even though they aren't officially supported, they work fine on ROCm.
AMD had a big architecture switchover exactly 5 years ago, and the full launch wasn't over until 4.5 years ago. I think that generation should have full support. Especially because it's not like they're cutting support now. They didn't support it at launch, and they didn't support it after 1, 2, 3, 4 years either.
The other way to look at things, I'd say that for a mid to high tier GPU to be obsolete based on performance, the replacement model needs to be over twice as fast. 7700XT is just over 50% faster than 5700XT.
I'm on a 5+ year old GPU, because I don't trust AMD to offer a compelling GPU that actually works. An RX 7 570 is good enough for the little gaming I do. It mostly acts as an oversized iGPU that has good Linux drivers, but since AMD is not supporting ROCm on this GPU, there is no need to hurry on upgrading to a better GPU or to get my feet wet on running things locally on the GPU like Stable Diffusion, LLMs, etc.
AMD's definition of "support" I think is different than what people expect, and pretty misleading - ROCm itself will run on almost anything, back as far as the RX 400/500 series:
There are out-of-bounds writes in the BLAS libraries for gfx803 GPUs (such as the RX 570). That hardware might work fine for your use case, but there's a lot of failures in the test suites.
I agree that the official support list is very conservative, but I wouldn't recommend pre-Vega GPUs for use with ROCm. Stick to gfx900 and newer, if you can.
The last time I checked, I was stuck with a pretty old kernel if I wanted to have the last version of ROCm available for my rx470. It's compatible at some point in time, but not kept compatible with recent kernels.
AMD should focus their efforts on competitive hardware offerings, because that is where the need and the money is. Sorry, I don't think the hobbyist should be a priority.
Support this, reimplement that, support upstream efforts, dont really care. Any of those would cost a couple of million and be worth a trillion dollars to AMD shareholders.