Super interesting that they chose 671B and 7B. no like 32B which feels like a "s...

versteegen · on May 1, 2025

Likely because they haven't got their own suitable SoTA base models of any other size to build on. DeepSeek V3 is 671B, and DeepSeek-Prover-v1.5 [1] is 7B only, based on DeepSeekMath which is 7B, which is based on DeepSeekCoder-Base-7B-v1.5. Maybe DeepSeek-Coder-V2 (16B and 236B) would be a good start but it's merged into DeepSeek V2.5, and V2.5 is inferior to V3. Or some version of Qwen.

[1] https://github.com/deepseek-ai/DeepSeek-Prover-V1.5

bredren · on April 30, 2025

Also notable is the earliest planning for a positive reception release of a new model might include both parameter-based and skill type market segmentation.

--> "In an increasingly crowded field of LLMs, how will our (costly to produce) model stand out?"

SweetSoftPillow · on April 30, 2025

I feel like this is very logical way to do things. Test hypothesis on small model, play around, get it working, apply findings to big model.

ddlsmurf · on May 1, 2025

or they did and it wasn't sweet ? (no idea but seems they would before redacting a publication)