NEW v0.4.0 — now open source under MIT
// articles

Multi-model code review, deeply explained.

Research, architecture, and opinion on running more than one LLM against your code. Grounded in the Multi-Agent Debate literature. No fluff.

Start here

Multi-Model AI Code Review in 2026: The State of the Art

Where multi-model review came from, where it is today, and what's still hard. Research, architecture, the four tuning levers, and how to adopt this workflow now.

Why Single-Model Code Review Is the Wrong Default

One model reviewing code it just wrote has been the industry default for two years. It's a bad one. What single-model review misses and what to do about it.

All articles

The MCP Revolution: How Tool-Call Integration Is Changing AI Code Review

The Model Context Protocol shifted AI code review from copy-paste chat into inline tool calls. What changed, why stdio-only matters for security, and where this is heading.

Per-Provider Weighting: Tuning Your AI Review Panel

Weight sliders turn the panel from "one-size-fits-all" into something you can tune for the code you actually review. How to think about weighting — and why weight zero is the most important value.

When to Stop Arguing: Adaptive Termination in AI Code Review Debates

The MAD paper observed useful convergence in 2–4 rounds; past that, quality degrades. How adaptive termination detects agreement, why title-similarity is imperfect, and how to tune the cutoff.

Moderator Decides, Strict Majority, Voting Threshold: Four Consensus Modes Compared

Joint Chiefs supports four consensus modes. Each has a different failure mode. A practical walkthrough of when to pick which — and when the choice matters most.

Degeneration of Thought: Why a Model Can't Reliably Review Its Own Code

The MAD paper's central diagnostic, in depth. Why self-reflection increases confidence without increasing correctness — and why sibling models from the same lab inherit the same problem.

Should You Anonymize Model Outputs Before Consensus?

Joint Chiefs strips model identities before the moderator reads the debate. The argument for — and the surprising argument against.

Best AI Model for Code Review: OpenAI, Gemini, Grok, Claude Compared

There is no single best model — and that's the point. A practical comparison of the four majors across the five dimensions that actually matter in review.

Multi-Agent Debate: The Research Behind Better AI Code Review

Liang et al.'s 2023 paper is the clearest research evidence that multi-model debate beats single-model self-reflection. What it proved, what it didn't, and what it means for AI code review in 2026.