Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm currently trying to make dynamic GGUF quants for them! It should use 24GB of VRAM + 128GB of RAM for dynamic 2bit or so - they should be up in an hour or so: https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruc.... On running them locally, I do have docs as well: https://docs.unsloth.ai/basics/qwen3-coder


Any significant benefits at 3 or 4 bit? I have access to twice that much VRAM and system RAM but of course that could potentially be better used for KV cache.


So dynamic quants like what I upload are not actually 4bit! It's a mixture of 4bit to 8bit with important layers being in higher precision! I wrote about our method here: https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs


For coding you want more precision so the higher the quant the better. But there is discussion if a smaller model in higher quant is better than a larger one in lower quant. Need to test for yourself with your use cases I'm afraid.

e: They did announce smaller variants will be released.


Yes the higher the quant, the better! The other approach is dynamically choosing to upcast some layers!


I can say that this really works great, I'm a heavy user of the unsloth dyanmic quants. I run DeepSeek v3/r1 in Q3, and ernie-300b and KimiK2 in Q3 too. Amazing performance. I run Qwen3-235b in both Q4 and Q8 and can barely tell the difference so much so that I just keep Q4 since it's twice as fast.


What hardware do you use, out of curiosity?


In the current era of MoE models, the system RAM memory bandwidth determines your speed more than the GPU does.


Thanks for using them! :)


You definitely want to use 4bit quants at minimum.

https://arxiv.org/abs/2505.24832

LLMs usually have about 3.6 bits of data per parameter. You're losing a lot of information if quantized to 2 bits. 4 bit quants are the sweet spot where there's not much quality loss.


I would say that three or four bit are likely to be significantly better. But that’s just from my previous experience with quants. Personally, I try not to use anything smaller than a Q4.


Legend


:)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: