I think you can do most of this already with llm-consortium (maybe needs the llm...

kridsdale1 · 2025-04-29T19:34:18 1745955258

Any links or names of example implementations of this?

irthomasthomas · 2025-04-29T19:38:02 1745955482

https://github.com/irthomasthomas/llm-consortium

also, you aren't limited to cli. When you save a consortium it creates a model. You can then interact with a consortium as if it where a normal model (albeit slower and higher quality). You can then serve your custom models on an openai endpoint and use them with any chat client that supports custom openai endpoints.

The default behaviour is to output just the final synthesis, and this should conform to your user prompt. I recently added the ability to continue conversations with a consortium. In this case it only includes your user prompt and final synthesis in the conversation, so it mimics a normal chat, unlike running multiple iterations in the consortium, where full iteration history and arbiter responses are included.

UV tool install llm

llm install llm-consortium

llm install llm-model-gateway

llm consortium save qwen-gem-sonnet -m qwen3-32b -n 2 -m sonnet-3.7 -m gemini-2.5-pro --arbiter gemini-2.5-flash --confidence-threshold 95 --max-iterations 3

llm serve qwen-gem-sonnet

In this example I used -n 2 on the qwen model since it's so cheap we can include multiple instances of it in a consortium

Gemini flash works well as the arbiter for most prompts. However if your prompt has complex formatting requirements, then embedding that within an already complex consortium prompt often confuses it. In that case use gemini-2.5-pro for the arbiter. .