I can barely run 33B, but anything more than 800 context and I oom. But it would run very comfortably on a bigger GPU or a 24GB+ laptop.
Theoretically some phones can comfortably handle 13B on mlc-llm though in practice its not really implemented yet.
I can barely run 33B, but anything more than 800 context and I oom. But it would run very comfortably on a bigger GPU or a 24GB+ laptop.
Theoretically some phones can comfortably handle 13B on mlc-llm though in practice its not really implemented yet.