It would be good if AMD did something, anything. Support this, reimplement that,...

chatmasta · on July 15, 2024

Is it weird how the comments here are blaming AMD and not Nvidia? Sure, the obvious argument is that Nvidia has no practical motivation to build an open platform. But there are counterexamples that suggest otherwise (Android). And there is a compelling argument that long term, their proprietary firmware layer will become an insufficient moat to their hardware dominance.

Who’s the root cause? The company with the dominant platform that refuses to open it up, or the competitor who can’t catch up because they’re running so far behind? Even if AMD made their own version of CUDA that was better in every way, it still wouldn’t gain adoption because CUDA has become the standard. No matter what they do, they’ll need to have a compatibility layer. And in that case maybe it makes sense for them to invest in the best one that emerges from the community.

lmm · on July 16, 2024

> Is it weird how the comments here are blaming AMD and not Nvidia?

Nvidia has put in the legwork and are reaping the rewards. They've worked closely with the people who are actually using their stuff, funding development and giving loads of support to researchers, teachers and so on, for probably a decade now. Why should they give all that away?

> But there are counterexamples that suggest otherwise (Android).

How is Android a counterexample? Google makes no money off of it, nor does anyone else. Google keeps Android open so that Apple can't move everyone onto their ad platform, so it's worth it for them as a strategic move, but Nvidia has no such motive.

> Even if AMD made their own version of CUDA that was better in every way, it still wouldn’t gain adoption because CUDA has become the standard.

Maybe. But again, that's because NVidia has been putting in the work to make something better for a decade or more. The best time for AMD to start actually trying was 10 years ago; the second-best time is today.

Zambyte · on July 16, 2024

> Google makes no money off of it, nor does anyone else

Google makes no money off of Android? That seems like a really weird claim to make. Do you really think Google would be anywhere near as valuable of a company if iOS had all of the market share that the data vacuum that is Android has? I can't imagine that being the case.

Google makes a boatload off of Android, just like AMD would if they supported open GPGPU efforts aggressively.

rjurney · on July 16, 2024

Android is a complement to Google's business, which is when open source works. What would be the complement worth $1 Trillion to NVIDIA to build a truly open platform? There isn't one. That was his point.

chatmasta · on July 16, 2024

There’s an entire derivative industry of GPUs, namely GenAI and LLM providers, that could be the “complement” to an open GPU platform. The exact design and interface between such a complement and platform is yet undefined, but I’m sure there are creative approaches to this problem.

rjurney · on July 16, 2024

And NVIDIA is playing in that game too. Why would they not play in higher level services as well? They already publish the source to their entire software stack. A comparison to Android is completely useless. Google is a multi-sided platform that does lots of things for free for some people (web users, Android users) so it can charge other people for their data (ad buyers). That isn't the chip business whatsoever. The original comment only makes sense if you know nothing about their respective business models.

chatmasta · on July 16, 2024

Yes, so when the ground inevitably shifts below their feet (it might happen years from now, but it will happen – open platforms always emerge and eventually proliferate), wouldn’t it be better for them to own that platform?

On the other hand, they could always wait for the most viable threat to emerge and then pay a few billion dollars to acquire it and own its direction. Google didn’t invent Android, after all…

> Google is a multi-sided platform that does lots of things for free for some people… That isn't the chip business whatsoever.

This is a reductionist differentiation that overlooks the similarities between the platforms of “mobile” and “GPU” (and also mischaracterizes the business model of Google, who does in fact make money directly from Android sales, and even moved all the way down the stack to selling hardware). In fact there is even a potentially direct analogy between the two platforms: LLM is the top of the stack with GPU on the bottom, just like Advertising is the top of the stack with Mobile on the bottom.

Yes, Google’s top level money printer is advertising, and everything they do (including Android) is about controlling the maximum number of layers below that money printer. But that doesn’t mean there is no benefit to Nvidia doing the same. They might approach it differently, since they currently own the bottom layer whereas Google started from the top layer. But the end result of controlling the whole stack will lead to the same benefits.

And you even admit in your comment that Nvidia is investing in these higher levels. My argument is that they are jeopardizing the longevity of these high-level investments due to their reluctance to invest in an open platform at the bottom layer (not even the bottom, but one level above their hardware). This will leave them vulnerable to encroachment by a player that comes from a higher level, like OpenAI for example, who gets to define the open platform before Nvidia ever has a chance to own it.

lmm · on July 17, 2024

> it might happen years from now, but it will happen – open platforms always emerge and eventually proliferate

30 years ago people were making the same argument that MS should have kept DirectX open or else they were going to lose to OpenGL. Look how that's worked out for them.

> Google, who does in fact make money directly from Android sales

They don't though. They have some amount of revenue from it, but it's a loss-making operation.

> In fact there is even a potentially direct analogy between the two platforms: LLM is the top of the stack with GPU on the bottom, just like Advertising is the top of the stack with Mobile on the bottom.

But which layer is the differentiator, and which layer is just commodity? Google gives away Android because it isn't better than iOS and isn't trying to be; "good enough" is fine for their business (if anything, being open is a way to stay relevant where they would otherwise fall behind). They don't give away the ad-tech, nor would they open up e.g. Maps data where they have a competitive advantage.

NVidia has no reason to open up CUDA; they have nothing to gain and a lot to lose by doing so. They make a lot of their money from hardware sales which they would open up to cannibalisation, and CUDA is already the industry standard that everyone builds on and stays compatible with. If there was ever a real competitive threat then that might change, but AMD has a long way to go to get there.

rjurney · on July 22, 2024

"Open up CUDA" - guys, its all open source. What do you want them to do? Do tech support to help their competitors compete against them? AMD is to blame for not building this project 10 years ago.

michaelt · on July 16, 2024

Google gave away the software platform - Android - to hardware vendors for free, vendors compete making the hardware into cheap, low-margin commodity items, and google makes boatloads of money from ads, tracking and the app store.

nvidia could give away the software platform - CUDA - to hardware vendors for free, making the hardware into cheap, low-margin commodity items. But how would they make boatloads of money when there's nowhere to put ads, tracking or an app store?

nemothekid · on July 16, 2024

>Is it weird how the comments here are blaming AMD and not Nvidia?

It's not. Even as it is, I do not trust HIP or RocM to be a viable alternative to Cuda. George Hotz did plenty of work trying to port various ML architectures to AMD and was met with countless driver bugs. The problem isn't nvidia won't build an open platform - the problem is AMD won't invest in a competitive platform. 99% of ML engineers do not write CUDA. For the vast majority of workloads, there are probably 20 engineers at Meta who write the Cuda backend for Pytorch that every other engineer uses. Meta could hire another 20 engineers to support whatever AMD has (they did, and it's not as robust as CUDA).

Even if CUDA was open - do you expect nvidia to also write drivers for AMD? I don't believe 3rd parties will get anywhere writing "compatibility layers" because AMD's own GPU aren't optimized or tested for CUDA-like workloads.

pjmlp · on July 16, 2024

Khrons, AMD and Intel have had 15 years to make something out of OpenCL that could rival CUDA.

Instead they managed 15 years of disappointment, in a standard stuck in C99, that adopted C++ and a polyglot bytecode too late to matter, never produced an ecosystem of IDE tooling and GPU libraries.

Naturally CUDA became the standard, when NVIDIA provided what the GPU community cared about.

roenxi · on July 16, 2024

> Is it weird how the comments here are blaming AMD and not Nvidia?

Not even a little bit. It simply isn't Nvidia's job to provide competitive alternatives to Nvidia. Competing is something AMD must take responsibility for.

The only reason CUDA is such a big talking point is because AMD tripped over their own feet supporting accelerated BLAS on AMD GPUs. Realistically it probably is hard to implement (AMD have a lot of competent people on staff) but Nvidia hasn't done anything unfair apart from execute so well that they make all the alternatives look bad.

jkmcf · on July 16, 2024

I agree with you, but replace NVIDIA with Apple. What would the EU say?

LtWorf · on July 16, 2024

I don't think nvidia bans anyone from running code on their devices.

Zambyte · on July 16, 2024

https://www.pcgamer.com/nvidia-officially-confirms-hash-rate...

Also: look into why the Nouveau driver performance is limited.

paulmd · on July 16, 2024

so terrible that vendors can enforce these proprietary licenses on software they paid to develop /s

Zambyte · on July 16, 2024

Huh? Why the sarcasm? You think it's a good thing that someone besides the person who owns the hardware has the final say on what the hardware is allowed to be used for?

paulmd · on July 18, 2024

of course not, but that's not actually a thing.

and you don't have the final say on what NVIDIA is allowed to do with their software either

Zambyte · on July 18, 2024

That's not actually a thing? I specifically moved away from Nvidia because

1) they choose (chose?) not to supprt standard display protocols that Wayland compositors target with their drivers (annoying, but not the end of the world)

2) they cryptographically lock users out of writing their own drivers for their own graphics cards (which should be illegal and is exactly contradictory to "that's not actually a thing").

Again: look into why the Nouveau driver performance is limited.

padthai · on July 16, 2024

They do from time to time: https://wirelesswire.jp/2017/12/62708/

kbolino · on July 16, 2024

This seems to be more about certain devices (consumer-grade GPUs) in certain settings (data centers), though I do question how enforceable it actually is. My guess is that it can only apply when you try to get discounts from bulk-ordering GPUs.

Also, was there any followup to this story? It seems a bit unnecessary because nVidia has already neutered consumer cards for many/most data center purposes by not using ECC and by providing so few FP64 units that double precision FLOPS is barely better than CPU SIMD.

paulmd · on July 16, 2024

it’s also not really a thing anymore because of the open kernel driver… at that point it’s just MIT licensed.

of course people continued to melt down about that for some reason too, in the customary “nothing is ever libre enough!” circular firing squad. Just like streamline etc.

There’s a really shitty strain of fanboy thought that wants libre software to be actively worsened (even stonewalled by the kernel team if necessary) so that they can continue to argue against nvidia as a bad actor that doesn’t play nicely with open source. You saw it with all these things but especially with the open kernel driver, people were really happy it didn’t get upstreamed. Shitty behavior all around.

You see it every time someone quotes Linus Torvalds on the issue. Some slight from 2006 is more important than users having good, open drivers upstreamed. Some petty brand preferences are legitimately far important than working with and bringing that vendor into the fold long-term, for a large number of people. Most of whom don’t even consider themselves fanboys! They just say all the things a fanboy would say, and act all the ways a fanboy would act…

whywhywhywhy · on July 16, 2024

>Is it weird how the comments here are blaming AMD and not Nvidia?

Because it IS AMD/Apple/etcs fault for the position they're in right now. CUDA showed where the world was heading and where the gains in compute would be made well over a decade ago now.

They even had OpenCL, didn't put the right amount of effort into it, all the talent found CUDA easier to work with so built there. Then what did AMD, Apple do? Double down and try and make something better and compete? Nah they fragmented and went their own way, AMD with what feels like a fraction of the effort even Apple put in.

From the actions of the other teams in the game it's not hard to imagine a world without CUDA being a world where this tech is running at a fraction of it's potential.

immibis · on July 16, 2024

It's always been on the straggler to catch up by cheating. That's just how the world works - even in open source. If AMD supported CUDA, it would have a bigger market share. That's a fact. Nvidia doesn't want that. That's a fact. But when Reddit started, it just scraped feeds from Digg, and when Facebook started, it let you link your MySpace credentials and scraped your MySpace account. Adversarial interoperability is nothing new.

cogman10 · on July 16, 2024

Funnily, who I blame the most for there not being real competition to CUDA is apple. As of late, Apple has been really pushing for vender lock in APIs rather than adopting open standards. The end result is you can get AMD and Intel onboard with some standard which is ultimately torpedoed by apple. (See apple departing from and rejecting everything that comes from the khronos group).

With the number of devs that use Apple silicon now-a-days, I have to think that their support for khronos initiatives like SYCL and OpenCL would have significantly accelerated progress and adoption in both.

We need an open standard that isn't just AMD specific to be successful in toppling CUDA.

aprilthird2021 · on July 18, 2024

Because Nvidia has made a compelling product and AMD has not...

slashdave · on July 15, 2024

ROCm counts as "something"

curt15 · on July 15, 2024

Pretty much any modern NVIDIA GPU supports CUDA. You don't have to buy a datacenter-class unit to get your feet wet with CUDA programming. ROCm will count as "something" when the same is true for AMD GPUs.

jacoblambda · on July 15, 2024

ROCm supports current gen consumer gpus officially and a decent chunk of recent gen consumer gpus unofficially. Not all of them of course but a decent chunk.

It's not ideal but I'm pretty sure CUDA didn't support everything from day 1. And ROCm is part of AMD's vendor part of the Windows AI stack so from upcoming gen on out basically anything that outputs video should support ROCm.

ChoGGi · on July 16, 2024

No, but CUDA at least supported the 8800 gt on release [1]. ROCm didn't support any consumer cards on release, looks like they didn't support any till last year? [2]

[1]https://www.gamesindustry.biz/nvidia-unveils-cuda-the-gpu-co...

[2]https://www.tomshardware.com/news/amd-rocm-comes-to-windows-...

muxr · on July 15, 2024

I don't think AMD needs to support 5+ year old GPUs personally. And all the recent generations are already practically supported.

AMD only claims support for a select few GPUs, but in my testing I find all the GPUs work fine if the architecture is supported. I've tested rx6600, rx6700xt for example and even though they aren't officially supported, they work fine on ROCm.

Dylan16807 · on July 16, 2024

> 5+ year old GPUs

AMD had a big architecture switchover exactly 5 years ago, and the full launch wasn't over until 4.5 years ago. I think that generation should have full support. Especially because it's not like they're cutting support now. They didn't support it at launch, and they didn't support it after 1, 2, 3, 4 years either.

The other way to look at things, I'd say that for a mid to high tier GPU to be obsolete based on performance, the replacement model needs to be over twice as fast. 7700XT is just over 50% faster than 5700XT.

imtringued · on July 16, 2024

I'm on a 5+ year old GPU, because I don't trust AMD to offer a compelling GPU that actually works. An RX 7 570 is good enough for the little gaming I do. It mostly acts as an oversized iGPU that has good Linux drivers, but since AMD is not supporting ROCm on this GPU, there is no need to hurry on upgrading to a better GPU or to get my feet wet on running things locally on the GPU like Stable Diffusion, LLMs, etc.

squidgyhead · on July 16, 2024

Here is the support list:

https://rocm.docs.amd.com/projects/install-on-linux/en/lates...

mappu · on July 16, 2024

AMD's definition of "support" I think is different than what people expect, and pretty misleading - ROCm itself will run on almost anything, back as far as the RX 400/500 series:

https://en.wikipedia.org/wiki/ROCm#:~:text=GCN%205%20%2D%20V...

Stable Diffusion ran fine for me on RX 570 and RX 6600XT with nothing but distro packages.

slavik81 · on July 16, 2024

There are out-of-bounds writes in the BLAS libraries for gfx803 GPUs (such as the RX 570). That hardware might work fine for your use case, but there's a lot of failures in the test suites.

I agree that the official support list is very conservative, but I wouldn't recommend pre-Vega GPUs for use with ROCm. Stick to gfx900 and newer, if you can.

Nab443 · on July 16, 2024

The last time I checked, I was stuck with a pretty old kernel if I wanted to have the last version of ROCm available for my rx470. It's compatible at some point in time, but not kept compatible with recent kernels.

mappu · on July 16, 2024

It's the responsibility of your distro to ship things that work together,

imtringued · on July 16, 2024

I don't buy it. Even running things like llama.cpp on my RX 570 via Vulkan crashes the entire system.

slashdave · on July 16, 2024

AMD should focus their efforts on competitive hardware offerings, because that is where the need and the money is. Sorry, I don't think the hobbyist should be a priority.

bavell · on July 16, 2024

Huh? I've been running ROCm for SD and LLMs for over a year and a half on my puny consumer 6750X - not even latest gen.

oezi · on July 15, 2024

A couple of million doesn't get you anything in corporate land

spacebanana7 · on July 15, 2024

A couple dozen billion for a 10% chance of becoming NVIDIA competitive is worth it, looking at the stock prices.

oezi · on July 16, 2024

Billions. Now we are talking.

langcss · on July 17, 2024

That or Geohot