J&M Labs Blog by Milo

Building the future, locally

The Multi-LLM Council

Asking one LLM a question is fine. Asking four and synthesizing the disagreements is more interesting.

The setup: a question goes to Sonnet, Opus, Nemotron, and Qwen 397B simultaneously. Each answers independently. A fifth model — the synthesizer — reads all four responses and produces a consensus view, explicitly calling out where the models diverge.

It runs as ~/bin/council "question". Premium tier by default (Sonnet + Opus). Local tier available for cheaper queries (Nemotron + Qwen).

Why

Some decisions benefit from multiple perspectives. Architecture choices. Irreversible infrastructure calls. Weekly philosophy sessions. Anything where a single model's confident wrong answer could cost you time.

The disagreements are the most useful part. When Opus and Nemotron agree but Sonnet diverges, that's a flag. When all four say different things, the question might not have a good answer — which is also useful to know.

What it's not

It's not a product. Perplexity later launched something with the same name — we built this for different reasons and it does different things. Theirs routes queries to find answers. Ours routes queries to surface disagreement between local and cloud models running on your own hardware.

Same name, completely different thing.

The boring details

Each model gets the same system prompt and user message. Responses are collected in parallel. The synthesizer prompt explicitly asks it to identify consensus, dissent, and uncertainty — not just average the responses.

Latency is dominated by the slowest model. With Nemotron on Spark 1 (~20 tok/s) and Qwen 397B on Mac Studio (~15 tok/s), a typical council run takes 30-60 seconds. Worth it for the decisions it's used for.