Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s because most devs nowadays are new devs and probably aren’t very familiar with native compilation.

So compiling the correct version of llama.cpp for their hardware is confusing.

Compound that with everyone’s relative inexperience with configuring any given model and you have prime grounds for a simple tool to exist.

That’s what ollama and their Modelfiles accomplish.



It's just because it's convenient. I wrote a rich text editor front end for llama.cpp and I originally wrote a quick go web server with streaming using the go bindings, but now I just use ollama because it's just simpler and the workflow for pulling down models with their registry and packaging new ones in containers is simpler. Also most people who want to play around with local models aren't developers at all.


I'm not sure why you are assuming that ollama users are developers when there are at least 30 different applications that have direct API integration with ollama.


Eh, I've been building native code for decades and hit quite a few roadblocks trying to get llama.cpp building with cuda support on my Ubuntu box. Library version issues and such. Ended up down a rabbit hole related to codenames for the various Nvidia architectures... It's a project on hold for now.

Weirdly, the Python bindings built without issue with pip.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: