I'm currently trying to make dynamic GGUF quants for them! It should use 24GB of...

zettabomb · 2025-07-22T22:29:29 1753223369

Any significant benefits at 3 or 4 bit? I have access to twice that much VRAM and system RAM but of course that could potentially be better used for KV cache.

danielhanchen · 2025-07-22T22:42:07 1753224127

So dynamic quants like what I upload are not actually 4bit! It's a mixture of 4bit to 8bit with important layers being in higher precision! I wrote about our method here: https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs

sourcecodeplz · 2025-07-22T22:34:27 1753223667

For coding you want more precision so the higher the quant the better. But there is discussion if a smaller model in higher quant is better than a larger one in lower quant. Need to test for yourself with your use cases I'm afraid.

e: They did announce smaller variants will be released.

danielhanchen · 2025-07-22T22:43:39 1753224219

Yes the higher the quant, the better! The other approach is dynamically choosing to upcast some layers!

segmondy · 2025-07-23T00:18:46 1753229926

I can say that this really works great, I'm a heavy user of the unsloth dyanmic quants. I run DeepSeek v3/r1 in Q3, and ernie-300b and KimiK2 in Q3 too. Amazing performance. I run Qwen3-235b in both Q4 and Q8 and can barely tell the difference so much so that I just keep Q4 since it's twice as fast.

someone13 · 2025-07-23T04:21:44 1753244504

What hardware do you use, out of curiosity?

jychang · 2025-07-23T10:33:19 1753266799

In the current era of MoE models, the system RAM memory bandwidth determines your speed more than the GPU does.

danielhanchen · 2025-07-23T02:16:33 1753236993

Thanks for using them! :)

jychang · 2025-07-23T10:31:42 1753266702

You definitely want to use 4bit quants at minimum.

https://arxiv.org/abs/2505.24832

LLMs usually have about 3.6 bits of data per parameter. You're losing a lot of information if quantized to 2 bits. 4 bit quants are the sweet spot where there's not much quality loss.

fzzzy · 2025-07-22T22:34:49 1753223689

I would say that three or four bit are likely to be significantly better. But that’s just from my previous experience with quants. Personally, I try not to use anything smaller than a Q4.

gardnr · 2025-07-22T22:25:27 1753223127

Legend

danielhanchen · 2025-07-22T22:26:32 1753223192