Should You Anonymize Model Outputs Before Consensus?

Direct answer
What anonymization actually means
Three ways identity leaks through
When anonymization helps most
When anonymization hurts
What Joint Chiefs does and why
FAQ
Key takeaways

Direct answer

We strip model names before the moderator reads the debate. It cuts bias. Mostly.

Anonymize by default. The "OpenAI said" / "Gemini said" labels get dropped before the moderator writes the synthesis. This kills a specific, documented problem — LLM judges tend to favor outputs from themselves or their siblings. But you pay for it. Identity carries calibration signal, and anonymization throws that signal away.

The right call depends on the consensus mode. For moderator-decides, anonymize. For weighted voting where per-provider weight encodes domain expertise, don't. Joint Chiefs anonymizes before the final synthesis and keeps attribution in the transcript so you can audit after the fact.

What anonymization actually means

In a multi-model debate, each spoke produces findings. Each finding has a title, severity, file, line, rationale, and — if you don't strip it — a provider label. Anonymization pulls the label before the moderator's context gets assembled.

What survives:

The finding — title, severity, file, line, rationale.
Position in the list (first, second, etc.).
Agreement count — how many spokes raised the same issue.
Round-by-round revisions, still tracked under anonymous IDs like "Spoke A" so the debate follows across turns.

What gets removed:

Provider name.
Model version.
Self-references in the text — "as GPT-5, I think…" gets rewritten or flagged.

The point is to force the moderator — often an LLM itself — to weigh findings by argument quality, not by pattern-matching on brand. LLM-as-judge research has found repeatedly that judges prefer outputs from themselves or their family. Pulling the label pulls the shortcut.

Three ways identity leaks through

Anonymization is not airtight. A careful reader — human or model — can often guess which provider wrote which finding. Three reasons.

1. Writing style

Models have voices. One writes long, qualifying sentences with hedged confidence. Another is terse and list-heavy. A third restates the problem before every finding. These fingerprints are stable across prompts and hard to scrub without rewriting the content itself. A moderator from the same family as a spoke can often recognize its sibling.

2. Characteristic vocabulary

Word choice leaks. One model prefers "utilize" where another says "use." One says "consider" where another says "you might." Technical vocabulary leaks harder — a model trained heavily on one security corpus reaches for terms (TOCTOU, use-after-free, prototype pollution) that another model uses less often. You can't remove this without rewriting the finding into something generic, at which point the content is worse.

3. Formatting preferences

Bullet depth, bold usage, code fence style, headers inside findings, trailing summaries — all fingerprints. Some models obey output schemas strictly. Others drift. A judge that has seen enough output from each provider starts recognizing the layout alone.

None of this makes anonymization pointless. Pulling the explicit label still cuts bias meaningfully — the leak is degraded signal, not clean signal. But it's a mitigation, not a guarantee. Design around that.

When anonymization helps most

The clearest win is moderator-decides mode. Spokes debate through their rounds, and then a single model — in Joint Chiefs, Claude by default — reads the full transcript and writes the consensus. That's the step where LLM-as-judge bias has the biggest effect, and the step where anonymization has the most to offer.

Three specific cases where anonymization is worth the calibration loss:

Case	Why anonymization helps
The moderator shares a family with one of the spokes.	Without anonymization, the moderator over-endorses its sibling's findings, even when the arguments are weaker.
One provider has a reputation that precedes it.	Both humans and models carry priors about "which lab is the serious one." Labels trigger those priors, even when they're wrong for the specific finding.
A minority position is well-argued but comes from a lower-reputation model.	Anonymization makes it harder to dismiss a well-reasoned finding on brand alone. The argument has to carry itself.

When anonymization hurts

Anonymization has one specific failure mode: it throws away information the voter or moderator could have used well.

The clearest case is weighted voting. Joint Chiefs supports per-provider weights from 0.0 to 3.0, so you can say "weight Grok at 1.5 on security findings and 0.8 on idiomatic-style findings." Weighting only works if the vote-counter knows which provider wrote which finding. If you anonymize and weight at the same time, the weights become dead code. They have no labels to attach to.

Second case, subtler: when one model has a large, well-calibrated edge in a specific domain — a known lead on a particular language or framework — a human reading the transcript wants attribution so they can weigh findings themselves. Anonymization inside the moderator's context is fine. Anonymization in the final report the human reads is often not.

Joint Chiefs threads this by anonymizing inside the moderator's context and restoring per-finding attribution in the output transcript. You get bias reduction on the decision step and calibration signal on the audit. That's it.

What Joint Chiefs does and why

Anonymization is on by default for moderator-decides mode. Findings are stripped of provider labels before Claude reads the debate and writes the synthesis. The orchestrator still tracks attribution and writes it to the transcript, so when you read the output you can see which provider raised which finding — just not during the model's decision step.

For voting-threshold mode with weights, anonymization is off by design. Weighting needs labels. Running both at once would be incoherent.

The other two consensus modes — strict majority and best-of-all — don't need anonymization. Their decision rules are mechanical. Strict majority counts agreement across spokes. Best-of-all picks the finding with the highest severity. Neither involves a judge reading prose, so there's no bias for anonymization to cut.

The practical rule: anonymization is a tool for cutting judge bias in modes that have a judge. It's not a universal good. Applying it to mechanical decisions is just noise.

Key takeaways

Anonymize when a moderator LLM will read the debate and write the decision. That's where brand bias hits hardest.
Writing style, vocabulary, and formatting leak identity even without labels. Anonymization is a mitigation, not a guarantee.
Don't anonymize when you're using weighted voting. The weights need labels to attach to.
Keep attribution in the output transcript even when it's hidden from the moderator. Humans benefit from calibration signal the model can't be trusted with.
Joint Chiefs anonymizes in moderator-decides mode, keeps attribution in transcripts, and skips anonymization where the decision rule is mechanical.

Frequently asked questions

What does it mean to anonymize model outputs before consensus?

You strip the provider label off each finding before the moderator reads the debate. No "OpenAI said." No "Gemini said." Just the arguments. The moderator judges reasoning instead of brand.

Do LLMs actually show brand bias when judging other LLMs?

Yes. LLM-as-judge research has found repeatedly that models tend to pick outputs from themselves or siblings in the same family, even when those outputs are worse. That's the bias anonymization exists to cut.

Doesn't anonymization also strip useful signal?

Yes. If you know one model is strong at security or concurrency, that's a useful prior when weighing findings. Anonymization throws that prior away. You trade calibration for less brand bias. Pick your poison.

Can anonymization really hide which model wrote what?

Not really. Writing style, vocabulary, and formatting leak identity even without the label. Models have voices. Removing the explicit tag still helps — it's a degraded signal, not a clean one — but a careful judge can often guess.

Does Joint Chiefs anonymize outputs?

Yes. Findings are anonymized before the final synthesis so the moderator judges arguments, not brands. Per-finding attribution stays in the transcript you read afterward — the moderator just doesn't see it during the decision step.

When should anonymization be turned off?

When you're using weighted voting and the weights carry domain meaning — say, a heavier vote for a security-strong model on security findings. Weighting needs labels to apply to. Run both at once and they cancel each other out.