Inception Labs’ Mercury 2 AI Is Fast — and It Beats Google’s DiffusionGemma at Its Own Game

Inception Labs just dropped Mercury 2, and the speed numbers are wild: roughly 1,000 tokens per second. For comparison, Claude Haiku 4.5 Reasoning does about 89, and GPT-5 Mini manages around 71.

Google’s DiffusionGemma hits similar speeds using the same basic approach — ditching word-by-word generation for parallel denoising, the same technique behind image generators like Stable Diffusion. But Mercury 2 pulls ahead on quality.

On the AIME 2026 math benchmark, Mercury 2 scored 90%. DiffusionGemma managed 69.1%. Even Google’s own standard Gemma 4 beat DiffusionGemma at 88.3% — which is why Google recommends the standard version for quality-sensitive applications.

The real-world impact shows up in coding tools. Augment Code swapped Mercury 2 in for Claude Opus 4.7 on a context-compaction subagent and saw latency drop 82% with a 90% cost cut, while keeping the same output quality.

Mercury 2 isn’t open weights — it’s a paid API. DiffusionGemma is free on Hugging Face. But if you need speed and quality together, Inception’s model is currently in a class of its own among diffusion LLMs.