Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think you can do most of this already with llm-consortium (maybe needs the llm-openrouter plugin with my pr merging)

A consortium sends the same prompt to multiple models in parallel and the responses are all sent to one arbiter model which judges the model responses. The arbiter decides if more iterations are required. It can also be forced to iterate more until confidence-threshold or min-iterations.

Now, using the pr i made to llm-openrouter, you can save an alias to a model that includes lots of model options. For examples, you can do llm openrouter save -m qwen3 -o online -o temperature 0, system "research prompt" --name qwen-researcher

And now, you can build a consortium where one member is an online research specialist. You could make another uses JSON mode for entity extraction, and a third which writes a blind draft. The arbiter would then make use of all that and synthesize a good answer.



Any links or names of example implementations of this?


https://github.com/irthomasthomas/llm-consortium

also, you aren't limited to cli. When you save a consortium it creates a model. You can then interact with a consortium as if it where a normal model (albeit slower and higher quality). You can then serve your custom models on an openai endpoint and use them with any chat client that supports custom openai endpoints.

The default behaviour is to output just the final synthesis, and this should conform to your user prompt. I recently added the ability to continue conversations with a consortium. In this case it only includes your user prompt and final synthesis in the conversation, so it mimics a normal chat, unlike running multiple iterations in the consortium, where full iteration history and arbiter responses are included.

UV tool install llm

llm install llm-consortium

llm install llm-model-gateway

llm consortium save qwen-gem-sonnet -m qwen3-32b -n 2 -m sonnet-3.7 -m gemini-2.5-pro --arbiter gemini-2.5-flash --confidence-threshold 95 --max-iterations 3

llm serve qwen-gem-sonnet

In this example I used -n 2 on the qwen model since it's so cheap we can include multiple instances of it in a consortium

Gemini flash works well as the arbiter for most prompts. However if your prompt has complex formatting requirements, then embedding that within an already complex consortium prompt often confuses it. In that case use gemini-2.5-pro for the arbiter. .




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: