Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No. Each expert is not separately trained, and while they may store different concepts, they are not meant to be different experts in specific domains. However, there are certain technologies to route requests to different domain expert LLMs or even fine-tuning adapters, such as RouteLLM.


Why do you think that a hand-configured selection between "different domains" is better than the training-based approach in MoE?


First off, they are basically completely different technologies, so it would be disingenuous to act like it's an apples-to-apples comparison.

But a simple way to see it is that when you pick between multiple large models that have different strengths, you have a larger amount of parameters just to work with (e.g. Deepseek R1 + V3 + Qwen + LLaMA ends up being 2 trillion total parameters to pick from), whereas "picking" the experts in an MoE like has a smaller amount of total different parameters you are working with (e.g. R1 is 671 billion, Qwen is 235).


That might already happen behind what they call test time compute


Many models that use test time compute are MoEs, but test-time compute is generally meant to refer to reasoning about the prompt/problem the model is given, not about reasoning about which model to pick, and I don't think anyone has released an LLM router under that name.


we dont know what OAI does to find the best answer when reasoning but I am pretty sure that having variations of a same model is part of it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: