MCP server
Drops into any MCP-aware client. The host LLM calls joint_chiefs_review and gets a consensus back inline. Stdio-only — no ports, no network exposure.
Code review. Architecture. Product features. Security. Joint Chiefs runs your code past OpenAI, Gemini, Grok, and Claude in parallel — plus any local model you point it at via Ollama or an OpenAI-compatible server. Then it has them debate until consensus. That's how you make your best calls. Open-source. One-click macOS install. Zero telemetry.
Prefer the command line? Install via CLI →
Each surface runs the same JointChiefsCore engine. Install the MCP server or the CLI, then in your agent or terminal, mention Joint Chiefs along with what you want reviewed — the debate fires and a consensus comes back. The app isn't required for either. Pick one, two, or all three.
Drops into any MCP-aware client. The host LLM calls joint_chiefs_review and gets a consensus back inline. Stdio-only — no ports, no network exposure.
jointchiefsPre-commit hooks, CI gates, headless audits, one-off debugging. Streaming SSE output, JSON mode, exit codes, stdin piping. Same engine as the MCP server.
CLI referenceOne-shot installer. Add API keys with live test buttons. Copy the standard MCP config snippet. Configure moderator, consensus mode, and tiebreaker. Installs the CLI + MCP binaries for you.
Download DMGThe debate protocol is a hub-and-spoke implementation of Multi-Agent Debate (Liang et al., 2023). Generals review independently, the moderator synthesizes between rounds, and a judge arbitrates the final call.
Every configured provider sees the same code with no peer influence. Different architectures catch different classes of bug.
Each model sees the prior round's findings with model identities stripped. Must agree, challenge, or revise — no ignoring inconvenient findings.
When positions converge, debate stops. Extra rounds add noise, not signal — a core finding of the MAD paper.
The moderator reads the full debate and writes the final synthesis. A well-argued minority position can override a weakly-justified majority.
The MAD paper's central finding: when a single model reflects on its own output, confidence rises regardless of whether the answer is correct. Joint Chiefs avoids this by using multiple independent models with different blind spots.
Demonstrates that adversarial collaboration between LLMs significantly improves factual accuracy and reasoning over single-model inference or single-model self-reflection. Joint Chiefs implements four of the paper's key protocols: adaptive break, tit-for-tat engagement, DoT prevention, and judge arbitration.
Liang, T., He, Z., Jiao, W., et al. (2023). Read the paper on arXivWorks inside the AI client you already live in. No context switch, no separate tool to launch.
Models address each finding by title, take a position, defend it across rounds. The moderator judges reasoning, not tallies.
Model identities are stripped before final arbitration — the judge evaluates arguments, not brands.
Tokens appear live. The orchestrator can tell "slow" from "dead" instead of blocking on a frozen socket.
Full debate written to disk. Replay, audit, or pipe into your own tooling. Nothing is cached server-side — there is no server.
Pick the moderator model, consensus mode (moderator decides, strict majority, best-of-all, voting threshold), tiebreaker, and rounds. Persisted per-user.
Add keys with live test buttons. Copy-paste the standard MCP config snippet. The app installs the binaries — no build step, no PATH wrestling.
API keys stay in macOS Keychain, accessed by a single signed binary. No telemetry. The only network traffic is to providers you configured.
Engine, CLI, and MCP server live on GitHub under the MIT license. Read the protocol implementation, run the test suite, fork the moderator. No black box.
macOS app (recommended): download the notarized DMG and drop Joint Chiefs.app into Applications. The setup app installs the CLI, MCP server, and keygetter binaries silently on first launch and walks you through key entry, strategy, and MCP wire-up.
Command line: clone the repo, build, install all three binaries. No GUI — add keys via environment variables in step 02. Homebrew tap coming soon.
$ git clone https://github.com/djfunboy/joint-chiefs.git
$ cd joint-chiefs/JointChiefs
$ swift build -c release
$ cp .build/release/jointchiefs .build/release/jointchiefs-mcp .build/release/jointchiefs-keygetter /opt/homebrew/bin/
App users: open the app, paste keys for the providers you want on the API Keys screen, hit Test on each. Keys land in macOS Keychain — only the signed jointchiefs-keygetter binary can read them back.
CLI users: export the env vars in your shell. Any one provider is enough to start.
$ export OPENAI_API_KEY="sk-..."
$ export ANTHROPIC_API_KEY="sk-ant-..." # also acts as the moderator
$ export GEMINI_API_KEY="..."
$ export GROK_API_KEY="..."
Copy the standard MCP config snippet from the setup app — or paste this — into your client's MCP configuration. The path is the Apple Silicon Homebrew prefix; replace it if you installed elsewhere.
{
"mcpServers": {
"joint-chiefs": {
"command": "/opt/homebrew/bin/jointchiefs-mcp"
}
}
}
Per-client config-file paths and the natural-language playbook prompt are in the MCP setup guide.
Ask your coding assistant: "Have the Joint Chiefs review src/auth.swift." The tool call runs, the panel debates, and the consensus comes back inline.
Prefer the command line? CLI reference
Research, architecture, and opinion on running more than one LLM against your code.
Where multi-model review came from, where it is today, and what's still hard. Research, architecture, the four tuning levers, and how to adopt this workflow now.
Liang et al.'s 2023 paper is the clearest evidence that multi-model debate beats single-model self-reflection. What it proved, what it didn't.
One model reviewing code it just wrote has been the industry default for two years. It's a bad one. What single-model review misses and what to do about it.
One consensus walks out. Install takes a minute.