I’m curious as to how they pulled this off. OpenCL isn’t that common in the wild...

Apofis · on March 15, 2024

I'm surprised they didn't speak about the implementation at all. Anyone got more intel?

harwoodr · on March 15, 2024

ROCm: https://github.com/ollama/ollama/commit/6c5ccb11f993ccc88c47...

skipants · on March 15, 2024

Another giveaway that it's ROCm is that it doesn't support the 5700 series...

I'm really salty because I "upgraded" to a 5700XT from a Nvidia GTX 1070 and can't do AI on the GPU anymore, purely because the software is unsupported.

But, as a dev, I suppose I should feel some empathy that there's probably some really difficult problem causing 5700XT to be unsupported by ROCm.

JonChesterfield · on March 15, 2024

I wrote a bunch of openmp code on a 5700XT a couple of years ago, if you're building from source it'll probably run fine

refulgentis · on March 15, 2024

They're open source and based on llama.cpp so nothings secret.

My money, looking at nothing, would be on one of the two Vulkan backends added in Jan/Feb.

I continue to be flummoxed by a mostly-programmer-forum treating ollama like a magical new commercial entity breaking new ground.

It's a CLI wrapper around llama.cpp so you don't have to figure out how to compile it

washadjeffmad · on March 15, 2024

I tried it recently and couldn't figure out why it existed. It's just a very feature limited app that doesn't require you to know anything or be able to read a model card to "do AI".

And that more or less answered it.

dartos · on March 15, 2024

It’s because most devs nowadays are new devs and probably aren’t very familiar with native compilation.

So compiling the correct version of llama.cpp for their hardware is confusing.

Compound that with everyone’s relative inexperience with configuring any given model and you have prime grounds for a simple tool to exist.

That’s what ollama and their Modelfiles accomplish.

tracerbulletx · on March 15, 2024

It's just because it's convenient. I wrote a rich text editor front end for llama.cpp and I originally wrote a quick go web server with streaming using the go bindings, but now I just use ollama because it's just simpler and the workflow for pulling down models with their registry and packaging new ones in containers is simpler. Also most people who want to play around with local models aren't developers at all.

imtringued · on March 16, 2024

I'm not sure why you are assuming that ollama users are developers when there are at least 30 different applications that have direct API integration with ollama.

mypalmike · on March 15, 2024

Eh, I've been building native code for decades and hit quite a few roadblocks trying to get llama.cpp building with cuda support on my Ubuntu box. Library version issues and such. Ended up down a rabbit hole related to codenames for the various Nvidia architectures... It's a project on hold for now.

Weirdly, the Python bindings built without issue with pip.

refulgentis · on March 15, 2024

Edited it out of my original comment because I didn't want to seem ranty/angry/like I have some personal vendatta, as opposed to just being extremely puzzled, but it legit took me months to realize it wasn't a GUI because of how it's discussed on HN, i.e. as key to democratizing, as a large, unique, entity, etc.

Hadn't thought about it recently. After seeing it again here, and being gobsmacked by the # of genuine, earnest, comments assuming there's extensive independent development of large pieces going on in it, I'm going with:

- "The puzzled feeling you have is simply because llama.cpp is a challenge on the best of days, you need to know a lot to get to fully accelerated on ye average MacBook. and technical users don't want a GUI for an LLM, they want a way to call an API, so that's why there isn't content extalling the virtues of GPT4All*. So TL;DR you're old and have been on computer too much :P"

but I legit don't know and still can't figure it out.

* picked them because they're the most recent example of a genuinely democratizing tool that goes far beyond llama.cpp and also makes large contributions back to llama.cpp, ex. GPT4All landed 1 of the 2 vulkan backends

j33zusjuice · on March 15, 2024

Ahhhh, I see what you did there.

moffkalast · on March 15, 2024

OpenCL is as dead as OpenGL and the inference implementations that exist are very unperformant. The only real options are CUDA, ROCm, Vulkan and CPU. And Vulkan is a proper pain too, takes forver to build compute shaders and has to do so for each model. It only makes sense on Intel Arc since there's nothing else there.

zozbot234 · on March 15, 2024

SYCL is a fairly direct successor to the OpenCL model and is not quite dead, Intel seems to be betting on it more than others.

mpreda · on March 15, 2024

ROCm includes OpenCL. And it's a very performant OpenCL implementation.

taminka · on March 15, 2024

why though? except for apple, most vendors still actively support it and newer versions of OpenCL are released…

karmakaze · on March 15, 2024

It would serve Nvidia right if their insistence on only running CUDA workloads on their hardware results in adoption of ROCm/OpenCL.

aseipp · on March 15, 2024

You can use OpenCL just fine on Nvidia, but CUDA is just a superior compute programming model overall (both in features and design.) Pretty much every vendor offers something superior to OpenCL (HIP, OneAPI, etc), because it simply isn't very nice to use.

karmakaze · on March 15, 2024

I suppose that's about right. The implementors are busy building on a path to profit and much less concerned about any sort-of lock-in or open standards--that comes much later in the cycle.

KeplerBoy · on March 15, 2024

OpenCL is fine on Nvidia Hardware. Of course it's a second class citizen next to CUDA, but then again everything is a second class citizen on AMD hardware.

shmerl · on March 15, 2024

May be Vulkan compute? But yeah, interesting how.

programmarchy · on March 15, 2024

Apple killed off OpenCL for their platforms when they created Metal which was disappointing. Sounds like ROCm will keep it alive but the fragmentation sucks. Gotta support CUDA, OpenCL, and Metal now to be cross-platform.

jart · on March 15, 2024

What is OpenCL? AMD GPUs support CUDA. It's called HIP. You just need a bunch of #define statements like this:

    #ifndef __HIP__
    #include <cuda_fp16.h>
    #include <cuda_runtime.h>
    #else
    #include <hip/hip_fp16.h>
    #include <hip/hip_runtime.h>
    #define cudaSuccess hipSuccess
    #define cudaStream_t hipStream_t
    #define cudaGetLastError hipGetLastError
    #endif

Then your CUDA code works on AMD.

programmarchy · on March 16, 2024

OpenCL is a Khronos open spec for GPU compute, and what you’d use on Apple platforms before Metal compute shaders and CoreML were released. If you wanted to run early ML models on Apple hardware, it was an option. There was an OpenCL backend for torch, for example.

jiggawatts · on March 15, 2024

Can you explain why nobody knows this trick, for some values of “nobody”?

jart · on March 15, 2024

No idea. My best guess is their background is in graphics and games rather than machine learning. When CUDA is all you've ever known, you try just a little harder to find a way to keep using it elsewhere.

wmf · on March 15, 2024

People know; it just hasn't been reliable.

jart · on March 15, 2024

What's not reliable about it? On Linux hipcc is about as easy to use as gcc. On Windows it's a little janky because hipcc is a perl script and there's no perl interpreter I'll admit. I'm otherwise happy with it though. It'd be nice if they had a shell script installer like NVIDIA, so I could use an OS that isn't a 2 year old Ubuntu. I own 2 XTX cards but I'm actually switching back to NVIDIA on my main workstation for that reason alone. GPUs shouldn't be choosing winners in the OS world. The lack of a profiler is also a source of frustration. I think the smart thing to do is to develop on NVIDIA and then distribute to AMD. I hope things change though and I plan to continue doing everything I can do to support AMD since I badly want to see more balance in this space.

wmf · on March 15, 2024

The compilation toolchain may be reliable but then you get kernel panics at runtime.

jart · on March 15, 2024

I've heard geohot is upset about that. I haven't tortured any of my AMD cards enough to run into that issue yet. Do you know how to make it happen?

imtringued · on March 16, 2024

Last time I used AMD GPUs for GPGPU all it took was running hashcat to make the desktop rendering unstable. I'm sure leaving it run overnight would've gotten me a system crash.

jart · on March 16, 2024

That's always happened with NVIDIA on Linux too, because Linux is an operating system that actually gives you the resources you ask for. Consider using a separate video card that's dedicated to your video needs. Otherwise you should use MacOS or Windows. It's 10x slower at building code. But I can fork bomb it while training a model and Netflix won't skip a frame. Yes I've actually done this.