People go a bit crazy about CUDA, ROCm and PyTorch, but I've been watching for a few years and have seen no evidence whatsoever that they are serious blockers. PyTorch does work on AMD cards and whatever ROCm can't do doesn't seem to be important because no-one has articulated why they need it in my line of sight. By far AMD's biggest problem is that their linux kernel drivers historically don't seem to be able to handle GEMM workloads without kernel panics.
Having some senior engineers taking a public interest in putting up this sort of article is rather exciting. I'm not going to give AMD the benefit of the doubt after their horrific performance in the 2010s and early 2020s but observing from a safe distance - they do look like they're on the right track and possibly even a fair way down the path to getting into the game.
You seem to be saying AMD GPUs can run PyTorch but can't run GEMM? Can you explain? I thought PyTorch used GEMM extensively.
I also don't understand the comment on "whatever ROCm can't do doesn't seem to be important because no-one has articulated why they need it in my line of sight". Isn't the problem with ROCm the lack of support? It's only officially supported on a tiny proportion of AMD's product line?
1) I have no issues throwing ROCm (which still uses its own path in the driver) at my Radeon at the same time I'm hammering it with "normal" path APIs (Legacy D3D, D3D12, OpenGL, Vulkan, etc), they are scheduled normally and compete for GPU resources normally.
2) ROCm memory allocation is weird in the driver. I have gotten my GPU to hardlock the entire system by allocating about 2x my VRAM, because I suspect its misusing/overusing mprotect().
That matches my experience, local LLMs or diffusion models would lock up the GPU when VRAM allocated was close to the maximum available (as monitored by nvtop). After decreasing the batch size or reducing the number of layers offloaded to the GPU, the same workload would run stable for hours.
for those who are using it for ML loads, what's the point of using it for desktop graphics (at least at the same time).
i.e. I'd argue that unless one is getting server chips (which negates the desktop graphics comment), it seems the vast majority of modern CPUs come with iGPUs that are sufficient for running a desktop environment. Unless one is planning to game on the same machine (but again, probably also not at the same time), if the above is the problem, why not use the iGPU for the desktop and use the dGPU for your ML workloads.
RDNA4 is officially supported on ROCm (release for that came out shortly after the drivers shipped), and PyTorch officially supports ROCm and AMD officially supports PyTorch's ROCm target.
Wow they support the whole RDNA4 product line now! Pleased to see AMD seem to finally getting somewhere with ROCm on consumer cards. It's been a long time coming. Looks like the Ryzen AI Max 395 has (Linux only) ROCm support too now.
I'd missed all of this when it arrived but I'm happy to see it. Articles like this one should be appearing a lot more often now and that's a good thing.