Contents
The direct answer
Weights run on a continuous scale from 0.0 to 3.0. Zero removes the provider from the panel — no API call, no findings, no cost. 1.0 is the default: one vote, full inclusion. 3.0 triples that provider's vote in votingThreshold mode.
Weights do two things. They control panel inclusion (zero means the provider is not called) and they control voting math in votingThreshold mode. They do not change the moderator's synthesis in moderatorDecides, and they do not apply to strictMajority or bestOfAll. If you only ever run moderatorDecides, weights are a binary in/out switch. That's it.
Default to 1.0 across the board. Move a weight when you have a specific reason — a provider whose style consistently misfits your codebase, a review topic that matches a provider's strength, a rate limit you want to dodge. Don't tune weights out of habit.
What the weights actually do
Every provider in Joint Chiefs carries a weight value. The orchestrator reads those weights at two distinct points.
First, at panel assembly. A weight of 0.0 drops the provider from the active panel before any API call goes out. No request, no cost, no findings collected. Non-zero weights — anywhere from 0.1 through 3.0 — include the provider in the spoke fan-out identically. At this stage, 0.5 and 2.5 behave the same. The model is called, streams its findings, and participates in every debate round on equal footing.
Second, at vote resolution in votingThreshold mode. After the debate rounds complete and the panel produces a final set of findings, each provider's endorsement of a finding gets multiplied by its weight. A finding backed by three providers at weights 1.0, 2.0, and 0.5 scores 3.5, not 3. The threshold you set — say, 2.5 to accept a finding — compares against that weighted total.
That's a narrow mechanism. It changes exactly one thing: how close-call votes resolve when you're running a mode that cares about vote math. In every other mode, weights above zero are informational. They indicate priority in your config file but don't change the output.
When to use weight 0
Weight zero is the right move in three situations.
The provider's style doesn't fit your codebase
Every model has a tone. Gemini tends toward long prose with lots of context. Grok tends toward terse, security-adjacent phrasing. OpenAI tends toward conventional pattern-matching. Claude tends toward narrative step-through. If you've worked with a provider for a few weeks and its findings consistently read as off-topic or mis-prioritized for the code you write, zero it out. A model that produces well-reasoned but irrelevant findings still consumes moderator attention and drags debate rounds.
You want to dodge a provider's rate limits
Rate limits are a real constraint on batch reviews. If you're running Joint Chiefs against a large diff, or running many reviews in a short window, one provider may hit a ceiling that slows the whole orchestration. Dropping the rate-limited provider to 0 for the duration of a batch is cleaner than retrying around a throttle. Turn it back on when you're done.
The provider throws persistent false positives for your stack
Sometimes a provider's training data skews it toward flagging patterns that are correct in your framework. SwiftUI idioms, Tailwind utility classes, and async-heavy server code are common triggers. If you've seen the same provider flag the same non-issue across three or four unrelated reviews, zero it for that kind of work. Not a permanent judgment about the model — just an acknowledgment that it isn't earning its place on this panel for this stack.
When to use weights above 1
The case for pushing a provider above 1.0 is narrower than the case for zeroing one out. It only applies when you're running votingThreshold mode, and only when the review topic matches a provider's documented strength.
Security-heavy reviews: boost Grok
Grok catches security-adjacent issues that other models hedge on. The mechanism is unclear — its training corpus is less well-documented than peers — but the pattern holds often enough to be a useful default. For a review of authentication code, input validation, or anything handling user-controlled data, setting Grok to 1.5 or 2.0 means a security finding it raises alone can still clear a 2.5 threshold, instead of getting buried by two peers who missed it.
Generalist conventional code: boost OpenAI
OpenAI's GPT family is strong on API misuse, conventional patterns, and "this is the standard way" critiques. For a review of plumbing code — data-access layers, configuration, boilerplate around a well-known library — a modest OpenAI boost (1.5) reflects that its prior on "what the normal shape looks like" is the one you trust most for that kind of work.
When you should not boost
Don't boost a provider because you like it. Don't boost to compensate for a provider that underperforms — zero the underperformer instead. Don't go above 2.0 without a specific reason. Weight 3.0 exists for the case where you effectively want one provider's finding to survive every disagreement, and you should only reach for it with a clear argument for why.
How weights interact with consensus modes
Joint Chiefs ships four consensus modes. Weights hit each one differently.
| Mode | How weights apply |
|---|---|
moderatorDecides | Inclusion only. Weight 0 excludes. All non-zero weights participate equally. The moderator reads arguments and writes the synthesis. |
strictMajority | Inclusion only. A finding must be backed by a majority of active providers, unweighted. |
bestOfAll | Inclusion only. Every finding from every active provider appears in the output. No voting. |
votingThreshold | Full weighted vote math. Each provider's endorsement is multiplied by its weight before comparing against the configured threshold. |
The practical consequence: if you're not running votingThreshold, the only weight that changes output is 0. Everything above 0 is equivalent. That's a feature, not a bug — most reviews benefit from the moderator reading arguments rather than counting weighted votes. Weighting is the tool for the case where you have a strong, quantifiable prior about how to break ties.
Three practical strategies
Routine commits
Default weights (1.0 across OpenAI, Gemini, Grok, Anthropic). moderatorDecides. 5 rounds with adaptive early break. No tuning. The panel is balanced, the moderator handles disagreement, and you don't burn cognitive budget second-guessing weights. If Ollama is enabled, include it at 1.0. This is the config you reach for 80% of the time.
Security reviews
Switch to votingThreshold with a threshold around 2.0 (out of a weighted max of roughly 6-7). Boost Grok to 2.0. Leave OpenAI and Gemini at 1.0. Keep Claude at 1.0 as the moderator. The effect: a security finding that Grok raises and Claude agrees with clears the threshold even if the other two dismiss it. A security finding that only Grok raises still has a shot at surviving the vote, instead of getting dropped because two peers missed it.
Large refactors
Large refactors benefit from bestOfAll — every provider's findings appear, and you (or the host calling Joint Chiefs) decide what to act on. Weights are irrelevant here except as an on/off switch. If Gemini's long-context handling is what you need for the sprawl, make sure it's non-zero. If a provider's style is noisy on rename-heavy diffs, drop it to 0 for the duration of the refactor. The goal is coverage, not consensus.
Key takeaways
- Weights run from 0.0 to 3.0. Default to 1.0. Move them when you have a specific reason.
- Weight 0 kicks a provider out. Non-zero weights all include the provider identically at panel assembly.
- Vote math only applies in votingThreshold. In every other mode, weights above 0 are equivalent.
- Boost above 1.0 when the review topic matches a documented strength and you're running votingThreshold.
- Routine commits: all defaults, moderatorDecides. Security reviews: boost Grok in votingThreshold. Refactors: bestOfAll, weights as an on/off switch.
Frequently asked questions
What does a provider weight of 0.0 actually do?
It kicks the provider out of the panel. The model isn't called, no findings come back, no cost is incurred. Use it when a provider's style doesn't fit your codebase, when you want to dodge a rate limit, or when a provider has been throwing persistent false positives for your language or framework.
Do weights affect moderator-decides mode?
Only for inclusion. In moderatorDecides, the moderator reads every spoke finding and writes the synthesis based on argument quality, not vote counts. A weight of 2.0 doesn't double that provider's say — it just means the provider is in the panel. The vote math only bites in votingThreshold.
When should I push a provider above 1.0?
When the review topic matches a provider's strength and you're running votingThreshold. Security-heavy reviews are the common case for boosting Grok. Plumbing and conventional API code are the common case for boosting OpenAI. Don't boost without a specific reason.
Is setting a weight of 3.0 the same as running only that model?
No. Weight 3.0 triples the vote, but every other non-zero provider still runs. You still get the full debate. The weighting only changes how ties and thresholds resolve. For a single-provider review, set every other provider to 0.0.
Do weights change how the adaptive early break behaves?
No. The adaptive early break triggers on convergence — when spokes stop introducing new findings and positions stabilize. Weights don't touch it. A heavily-weighted provider's findings still have to be argued out before the panel can converge.
Can I change weights between reviews?
Yes. Weights live in the strategy config and can be changed per-run via the CLI, per-project via a local config file, or per-invocation through the MCP server's tool arguments. Nothing about weights is sticky across runs unless you write it to your config.