Contents
The direct answer
Joint Chiefs supports per-provider weights on a continuous scale from 0.0 to 3.0. A weight of 0.0 removes the provider from the panel entirely. A weight of 1.0 is the default — one vote, full inclusion. A weight of 3.0 triples that provider's vote in the voting-threshold consensus mode.
Weights do two things at once: they control panel inclusion (zero means the provider is not called) and they control voting math in voting-threshold mode. They do not affect the moderator's final synthesis in moderator-decides mode, and they do not apply to strict-majority or best-of-all. If you only ever run moderator-decides, weights are effectively a binary in/out switch for each provider.
Default to 1.0 across the board. Move weights when you have a specific reason — a provider whose style consistently misfits your codebase, a review topic that matches a provider's strength, a rate-limit constraint you want to dodge. Do not tune weights as a matter of habit.
What the weights actually do
Every provider in Joint Chiefs carries a weight value. The orchestrator reads those weights at two distinct points in the pipeline.
First, at panel assembly. A weight of 0.0 drops the provider from the active panel before any API call is made. No request goes out, no cost is incurred, no findings are collected. Non-zero weights — anywhere from 0.1 through 3.0 — include the provider in the spoke fan-out identically. At this stage, 0.5 and 2.5 behave the same. The model is called, streams its findings, and participates in every debate round on equal footing.
Second, at vote resolution in voting-threshold mode. After the debate rounds complete and the panel produces a final set of findings, each provider's endorsement of a finding is multiplied by its weight. A finding with endorsement from three providers at weights 1.0, 2.0, and 0.5 scores a 3.5, not a 3. The threshold you set — say, 2.5 to accept a finding — compares against that weighted total.
This is a narrow mechanism. It changes exactly one thing: how close-call votes resolve when you are running a mode that cares about vote math. In every other mode, the weights above zero are informational — they indicate priority in your config file but do not change the output.
When to use weight 0
A weight of 0 is the right move in three specific situations.
The provider's style does not fit your codebase
Every model has a tone. Gemini tends toward long prose with extensive context; Grok tends toward terse, security-adjacent phrasing; OpenAI tends toward conventional pattern-matching; Claude tends toward narrative step-through. If you have worked with a provider for a few weeks and its findings consistently read as off-topic or mis-prioritized for the kind of code you write, zero it out. A model that produces well-reasoned but irrelevant findings still consumes moderator attention and drags debate rounds.
You want to avoid a provider's rate limits
API rate limits are a real constraint on batch reviews. If you are running Joint Chiefs against a large diff, or running many reviews in a short window, one provider may hit a ceiling that slows the entire orchestration. Dropping the rate-limited provider to 0 for the duration of a batch is a cleaner fix than retrying around a throttle. Turn it back on when you are done.
The provider produces persistent false positives for your stack
Sometimes a provider's training data skews it toward flagging patterns that are correct in your framework. SwiftUI idioms, Tailwind utility classes, and async-heavy server code are common triggers. If you have seen the same provider flag the same non-issue across three or four unrelated reviews, zero it for that kind of work. This is not a permanent judgment about the model — just an acknowledgment that it is not earning its place on this panel for this stack.
When to use weights above 1
The case for pushing a provider above 1.0 is narrower than the case for zeroing one out. It applies only when you are running voting-threshold mode and only when the review topic matches a provider's documented strength.
Security-heavy reviews: boost Grok
Grok has a reputation in the community for catching security-adjacent issues that other models hedge on. The mechanism is unclear — its training corpus is less well-documented than peers — but the pattern holds often enough to be a useful default. For a review of authentication code, input validation, or anything handling user-controlled data, setting Grok to 1.5 or 2.0 means a security finding it raises alone will still clear a 2.5 threshold, instead of getting buried by two peers who missed it.
Generalist conventional code: boost OpenAI
OpenAI's GPT family is historically strong on API misuse, conventional patterns, and "this is the standard way" critiques. For a review of plumbing code — data-access layers, configuration, boilerplate around a well-known library — a modest OpenAI boost (1.5) reflects that its prior on "what the normal shape looks like" is the one you trust most for that kind of work.
When you should not boost
Do not boost a provider because you "like" it. Do not boost to compensate for a provider that underperforms — zero the underperformer instead. Do not boost above 2.0 without a specific reason; weight 3.0 exists for the case where you effectively want one provider's finding to survive every disagreement, and you should only reach for it if you have a clear argument for why.
How weights interact with consensus modes
Joint Chiefs ships four consensus modes. Weights interact with each one differently.
| Mode | How weights apply |
|---|---|
moderatorDecides | Inclusion only. Weight 0 excludes; all non-zero weights participate equally. The moderator reads arguments and writes the synthesis. |
strictMajority | Inclusion only. A finding must be endorsed by a majority of active providers, unweighted. |
bestOfAll | Inclusion only. Every finding from every active provider appears in the output. No voting. |
votingThreshold | Full weighted vote math. Each provider's endorsement is multiplied by its weight before comparing against the configured threshold. |
The practical consequence: if you are not running votingThreshold, the only weight that changes output is 0. Everything above 0 is equivalent. This is a feature, not a bug — most reviews benefit from the moderator reading arguments rather than counting weighted votes. Weighting is the tool for the case where you have a strong, quantifiable prior about how to break ties.
Three practical strategies
Routine commits
Default weights (1.0 across OpenAI, Gemini, Grok, Anthropic), moderator-decides, five rounds with adaptive early break. No tuning. The panel is balanced, the moderator handles disagreement, and you do not spend cognitive budget second-guessing weights. If Ollama is enabled, include it at 1.0 as well. This is the config you reach for 80% of the time.
Security reviews
Switch to voting-threshold with a threshold around 2.0 (out of a weighted max of roughly 6–7). Boost Grok to 2.0. Leave OpenAI and Gemini at 1.0. Keep Claude at 1.0 as the moderator. The effect: a security finding that Grok raises and Claude agrees with clears the threshold even if the other two dismiss it. A security finding that only Grok raises still has a chance of surviving the vote, instead of getting dropped because two peers missed it.
Large refactors
Large refactors benefit from best-of-all mode — every provider's findings appear, and you (or the host LLM calling Joint Chiefs) decide what to act on. Weights are irrelevant here except as an on/off switch. If Gemini's long-context handling is what you need for the sprawl, make sure it is non-zero. If a particular provider's style is noisy on rename-heavy diffs, drop it to 0 for the duration of the refactor. The goal is coverage, not consensus.
Key takeaways
- Weights run from 0.0 to 3.0. Default to 1.0 and only move them when you have a specific reason.
- Weight 0 excludes the provider; non-zero weights include the provider in the panel identically.
- Voting math only applies in voting-threshold mode. In every other mode, weights above 0 are equivalent.
- Boost a provider above 1.0 when the review topic matches a documented strength and you are running voting-threshold.
- Routine commits: all defaults, moderator-decides. Security reviews: boost Grok in voting-threshold. Refactors: best-of-all, use weights as an on/off switch.
Frequently asked questions
What does a provider weight of 0.0 actually do?
It removes the provider from the panel entirely. The model is not called, no findings are produced, and no cost is incurred. Use it when a provider's style does not fit your codebase, when you want to avoid rate-limit pressure, or when a provider produces a persistent pattern of false positives for your language or framework.
Do weights affect moderator-decides mode?
Only for inclusion. In moderator-decides, the moderator reads every spoke finding and writes the synthesis based on argumentation, not vote counts. A weight of 2.0 does not make that provider's findings count twice — it just means the provider is in the panel. Weighting math only applies in voting-threshold.
When should I push a provider above 1.0?
When the review topic matches a provider's historical strength and you are running voting-threshold. Security-adjacent reviews are the common case for boosting Grok. Plumbing and conventional API code are a common case for boosting OpenAI. Do not boost without a specific reason.
Is setting a weight of 3.0 the same as running only that model?
No. Weight 3.0 triples the provider's vote, but the panel still includes every other non-zero provider. You still get the full debate. The weighting only changes how ties and thresholds resolve. For a single-provider review, set every other provider to 0.0.
Do weights change how the adaptive early break behaves?
No. The adaptive early break triggers on convergence — when spokes stop introducing new findings and positions stabilize. It is orthogonal to weighting. A heavily-weighted provider's findings still need to be argued out before the panel can converge.
Can I change weights between reviews?
Yes. Weights live in the strategy config and can be changed per-run via the CLI, per-project via a local config file, or per-invocation through the MCP server's tool arguments. Nothing about weights is sticky across runs unless you write it to your config.