ROCm, PyTorch?

roenxi · 2025-07-21T10:50:15 1753095015

People go a bit crazy about CUDA, ROCm and PyTorch, but I've been watching for a few years and have seen no evidence whatsoever that they are serious blockers. PyTorch does work on AMD cards and whatever ROCm can't do doesn't seem to be important because no-one has articulated why they need it in my line of sight. By far AMD's biggest problem is that their linux kernel drivers historically don't seem to be able to handle GEMM workloads without kernel panics.

Having some senior engineers taking a public interest in putting up this sort of article is rather exciting. I'm not going to give AMD the benefit of the doubt after their horrific performance in the 2010s and early 2020s but observing from a safe distance - they do look like they're on the right track and possibly even a fair way down the path to getting into the game.

fancyfredbot · 2025-07-21T13:27:17 1753104437

You seem to be saying AMD GPUs can run PyTorch but can't run GEMM? Can you explain? I thought PyTorch used GEMM extensively.

I also don't understand the comment on "whatever ROCm can't do doesn't seem to be important because no-one has articulated why they need it in my line of sight". Isn't the problem with ROCm the lack of support? It's only officially supported on a tiny proportion of AMD's product line?

benreesman · 2025-07-21T14:23:53 1753107833

Parent seems to be saying that stability issues on consumer RDNA cards are the issue as opposed to ROCm support in PyTorch.

imtringued · 2025-07-21T15:12:11 1753110731

My pet theory is that the scheduler can't handle desktop graphics + ML workloads simultaneously, which leads to a deadlock in the firmware.

DiabloD3 · 2025-07-21T16:58:16 1753117096

Your pet theory needs work.

1) I have no issues throwing ROCm (which still uses its own path in the driver) at my Radeon at the same time I'm hammering it with "normal" path APIs (Legacy D3D, D3D12, OpenGL, Vulkan, etc), they are scheduled normally and compete for GPU resources normally.

2) ROCm memory allocation is weird in the driver. I have gotten my GPU to hardlock the entire system by allocating about 2x my VRAM, because I suspect its misusing/overusing mprotect().

skirmish · 2025-07-22T05:12:59 1753161179

That matches my experience, local LLMs or diffusion models would lock up the GPU when VRAM allocated was close to the maximum available (as monitored by nvtop). After decreasing the batch size or reducing the number of layers offloaded to the GPU, the same workload would run stable for hours.

compsciphd · 2025-07-21T16:38:31 1753115911

for those who are using it for ML loads, what's the point of using it for desktop graphics (at least at the same time).

i.e. I'd argue that unless one is getting server chips (which negates the desktop graphics comment), it seems the vast majority of modern CPUs come with iGPUs that are sufficient for running a desktop environment. Unless one is planning to game on the same machine (but again, probably also not at the same time), if the above is the problem, why not use the iGPU for the desktop and use the dGPU for your ML workloads.

DiabloD3 · 2025-07-21T16:47:28 1753116448

What about them?

RDNA4 is officially supported on ROCm (release for that came out shortly after the drivers shipped), and PyTorch officially supports ROCm and AMD officially supports PyTorch's ROCm target.

fancyfredbot · 2025-07-21T18:08:44 1753121324

Wow they support the whole RDNA4 product line now! Pleased to see AMD seem to finally getting somewhere with ROCm on consumer cards. It's been a long time coming. Looks like the Ryzen AI Max 395 has (Linux only) ROCm support too now.

I'd missed all of this when it arrived but I'm happy to see it. Articles like this one should be appearing a lot more often now and that's a good thing.