Any significant benefits at 3 or 4 bit? I have access to twice that much VRAM and system RAM but of course that could potentially be better used for KV cache.
So dynamic quants like what I upload are not actually 4bit! It's a mixture of 4bit to 8bit with important layers being in higher precision! I wrote about our method here: https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs
For coding you want more precision so the higher the quant the better.
But there is discussion if a smaller model in higher quant is better than a larger one in lower quant. Need to test for yourself with your use cases I'm afraid.
e: They did announce smaller variants will be released.
I can say that this really works great, I'm a heavy user of the unsloth dyanmic quants. I run DeepSeek v3/r1 in Q3, and ernie-300b and KimiK2 in Q3 too. Amazing performance. I run Qwen3-235b in both Q4 and Q8 and can barely tell the difference so much so that I just keep Q4 since it's twice as fast.
LLMs usually have about 3.6 bits of data per parameter. You're losing a lot of information if quantized to 2 bits. 4 bit quants are the sweet spot where there's not much quality loss.
I would say that three or four bit are likely to be significantly better. But that’s just from my previous experience with quants. Personally, I try not to use anything smaller than a Q4.