Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can easily run LLaMA 13B on my 6GB VRAM/16GB RAM laptop using llama.cpp (specifically Kobold.cpp as the frontend).

I can barely run 33B, but anything more than 800 context and I oom. But it would run very comfortably on a bigger GPU or a 24GB+ laptop.

Theoretically some phones can comfortably handle 13B on mlc-llm though in practice its not really implemented yet.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: