Microsoft Pits GPT Against Claude—And the Winner Is Better Research
Microsoft's new Critique and Council modes make GPT and Claude work together on research tasks. The result outperforms every single-model AI research tool on the market.
The End of the Mono-Model Research Era
Deep research AI has been one of the hottest arms races in tech. Google launched its Gemini research agent in December 2024. OpenAI followed in February 2025. xAI, Perplexity, and Anthropic's Claude all jumped in. Every company has been selling the same pitch: our single model is the smartest researcher in the room.
Microsoft just said: why pick one?
The company announced two new modes for Copilot's Researcher tool—Critique and Council—that put OpenAI's GPT and Anthropic's Claude to work on the same research task. The result, according to Microsoft's testing, scores higher than every standalone system on the DRACO benchmark, a standardized test covering 100 complex research tasks across medicine, law, and technology.
How Critique Works
Critique breaks the research workflow in two. GPT handles phase one: it plans the research, pulls sources, and writes an initial draft. Then Claude steps in as a strict editor, reviewing the report for factual accuracy, citation quality, and whether the answer actually addressed the question. Only after that review does the final report reach the user. Microsoft says the roles can eventually run in reverse—Claude drafting, GPT critiquing—though for now GPT goes first.
Think of it like a newsroom. One reporter files the story. An editor kills the fluff, fact-checks the claims, and sends it back with margin notes. Two minds, one deliverable.
On the DRACO benchmark, Copilot with Critique scored 57.4 points. Anthropic's Claude Opus 4.6 by itself hit 42.7. Microsoft's combined system beats the next-best result by nearly 14%. The biggest gains showed up in breadth of analysis, presentation quality, and factual accuracy.
How Council Works
Council takes a different approach. Instead of collaboration, it creates competition. GPT and Claude run simultaneously, each producing a full report independently. A third "judge" model then reads both and writes a summary explaining where the two AIs agreed, where they diverged, and what unique angles each caught that the other missed.
This is essentially automated adversarial thinking. When two models disagree, it is a signal. When they agree, confidence goes up. Users have been forced to compare AI research tools manually until now. Council automates the comparison.
Why This Matters
The problem Critique is designed to fix is the single biggest weakness of every AI research tool on the market: one model doing everything with nobody checking its work. That means hallucinations slip through. Citations get faked. Edge cases get missed. By separating generation from evaluation, Microsoft forces a layer of accountability into the pipeline.
Microsoft's bet is that no single model stays on top for long. The real value is in the orchestration layer that routes tasks to whichever combination works best. OpenAI and Microsoft have a multibillion-dollar partnership, but Microsoft is betting on a future where the winning strategy is not owning the best model—it is controlling the workflow that uses multiple models.
Critique is the default experience in Researcher. Council requires selecting "Model Council" from the picker to activate side-by-side mode. Both are currently available to users in Microsoft's Frontier program, the early-access channel for Copilot's newest capabilities. A Microsoft 365 Copilot license ($30/user/month) is required.
The Bigger Shift
This is not just a feature update. It is a signal that the AI research market is maturing past the "who has the biggest model" phase and into a period of architectural sophistication. The companies that win will not necessarily have the highest benchmark scores. They will have the best systems for routing, validating, and cross-checking model outputs.
The era of mono-model dominance is ending. The era of orchestrated multi-model intelligence has begun.
💀 Monster Take
Microsoft just weaponized the AI rivalry between OpenAI and Anthropic. GPT writes the draft. Claude picks it apart. A third judge tallies the score. It is like having two lawyers argue your case while a third one writes the verdict. Mono-model research tools are now the equivalent of trusting a single fact-checker. Welcome to the age of adversarial AI.