I’m curious as to how they pulled this off. OpenCL isn’t that common in the wild relative to Cuda. Hopefully it can become robust and widespread soon enough. I personally succumbed to the pressure and spent a relative fortune on a 4090 but wish I had some choice in the matter.
Another giveaway that it's ROCm is that it doesn't support the 5700 series...
I'm really salty because I "upgraded" to a 5700XT from a Nvidia GTX 1070 and can't do AI on the GPU anymore, purely because the software is unsupported.
But, as a dev, I suppose I should feel some empathy that there's probably some really difficult problem causing 5700XT to be unsupported by ROCm.
I tried it recently and couldn't figure out why it existed. It's just a very feature limited app that doesn't require you to know anything or be able to read a model card to "do AI".
It's just because it's convenient. I wrote a rich text editor front end for llama.cpp and I originally wrote a quick go web server with streaming using the go bindings, but now I just use ollama because it's just simpler and the workflow for pulling down models with their registry and packaging new ones in containers is simpler. Also most people who want to play around with local models aren't developers at all.
I'm not sure why you are assuming that ollama users are developers when there are at least 30 different applications that have direct API integration with ollama.
Eh, I've been building native code for decades and hit quite a few roadblocks trying to get llama.cpp building with cuda support on my Ubuntu box. Library version issues and such. Ended up down a rabbit hole related to codenames for the various Nvidia architectures... It's a project on hold for now.
Weirdly, the Python bindings built without issue with pip.
Edited it out of my original comment because I didn't want to seem ranty/angry/like I have some personal vendatta, as opposed to just being extremely puzzled, but it legit took me months to realize it wasn't a GUI because of how it's discussed on HN, i.e. as key to democratizing, as a large, unique, entity, etc.
Hadn't thought about it recently. After seeing it again here, and being gobsmacked by the # of genuine, earnest, comments assuming there's extensive independent development of large pieces going on in it, I'm going with:
- "The puzzled feeling you have is simply because llama.cpp is a challenge on the best of days, you need to know a lot to get to fully accelerated on ye average MacBook. and technical users don't want a GUI for an LLM, they want a way to call an API, so that's why there isn't content extalling the virtues of GPT4All*. So TL;DR you're old and have been on computer too much :P"
but I legit don't know and still can't figure it out.
* picked them because they're the most recent example of a genuinely democratizing tool that goes far beyond llama.cpp and also makes large contributions back to llama.cpp, ex. GPT4All landed 1 of the 2 vulkan backends
OpenCL is as dead as OpenGL and the inference implementations that exist are very unperformant. The only real options are CUDA, ROCm, Vulkan and CPU. And Vulkan is a proper pain too, takes forver to build compute shaders and has to do so for each model. It only makes sense on Intel Arc since there's nothing else there.
You can use OpenCL just fine on Nvidia, but CUDA is just a superior compute programming model overall (both in features and design.) Pretty much every vendor offers something superior to OpenCL (HIP, OneAPI, etc), because it simply isn't very nice to use.
I suppose that's about right. The implementors are busy building on a path to profit and much less concerned about any sort-of lock-in or open standards--that comes much later in the cycle.
OpenCL is fine on Nvidia Hardware. Of course it's a second class citizen next to CUDA, but then again everything is a second class citizen on AMD hardware.
Apple killed off OpenCL for their platforms when they created Metal which was disappointing. Sounds like ROCm will keep it alive but the fragmentation sucks. Gotta support CUDA, OpenCL, and Metal now to be cross-platform.
OpenCL is a Khronos open spec for GPU compute, and what you’d use on Apple platforms before Metal compute shaders and CoreML were released. If you wanted to run early ML models on Apple hardware, it was an option. There was an OpenCL backend for torch, for example.
No idea. My best guess is their background is in graphics and games rather than machine learning. When CUDA is all you've ever known, you try just a little harder to find a way to keep using it elsewhere.
What's not reliable about it? On Linux hipcc is about as easy to use as gcc. On Windows it's a little janky because hipcc is a perl script and there's no perl interpreter I'll admit. I'm otherwise happy with it though. It'd be nice if they had a shell script installer like NVIDIA, so I could use an OS that isn't a 2 year old Ubuntu. I own 2 XTX cards but I'm actually switching back to NVIDIA on my main workstation for that reason alone. GPUs shouldn't be choosing winners in the OS world. The lack of a profiler is also a source of frustration. I think the smart thing to do is to develop on NVIDIA and then distribute to AMD. I hope things change though and I plan to continue doing everything I can do to support AMD since I badly want to see more balance in this space.
Last time I used AMD GPUs for GPGPU all it took was running hashcat to make the desktop rendering unstable. I'm sure leaving it run overnight would've gotten me a system crash.
That's always happened with NVIDIA on Linux too, because Linux is an operating system that actually gives you the resources you ask for. Consider using a separate video card that's dedicated to your video needs. Otherwise you should use MacOS or Windows. It's 10x slower at building code. But I can fork bomb it while training a model and Netflix won't skip a frame. Yes I've actually done this.