Cognition's Two Headline Numbers Both Need Asterisks. The Real Story Is More Interesting Than Either.
On May 27, 2026, Cognition AI raised $1 billion at a $25 billion pre-money valuation, leading with two numbers: '89% of our code committed by Devin' and '13x ARR growth in 12 months.' Both require unpacking. The 89% figure means 89% of pull requests at Cognition are opened by Devin — human engineers still review every PR before it merges, and Cognition has never disclosed what percentage of those PRs are merged as-is versus revised. The 13x growth compares Devin-only ARR before the July 2025 Windsurf acquisition ($37M) to combined Cognition + Windsurf post-acquisition ARR ($492M); Windsurf had $82M ARR at acquisition and organic Devin-only growth is not separately disclosed. The actual story — that enterprise buyers are paying $26 billion for an agent that scores roughly half what benchmark leaders score, because they are buying autonomy not accuracy — is more interesting and entirely uncovered.
On May 27, 2026, Cognition AI closed a $1 billion Series D at a $25 billion pre-money valuation — post-money $26 billion — led by Lux Capital, General Catalyst, and 8VC, with Founders Fund, Elad Gil, and a roster of existing institutional investors re-upping. The announcement led with two numbers: 89% of Cognition's own code committed by Devin, and 13x revenue growth in 12 months to $492 million annualized run rate. Both numbers are real. Both need significant context that coverage did not provide.
What "89% of code committed by Devin" actually means.
The exact language in Cognition's Series D blog post: "89% of code committed by our engineers is committed by Devin."
This means 89% of pull requests at Cognition are opened by Devin. Human engineers at Cognition review every PR before it is merged. The remaining 11% of commits come from "local agents in Windsurf" — not human engineers writing code directly. CEO Scott Wu stated explicitly: "It should always be up to the human what to do."
The actual claim is: humans at Cognition have stopped being code authors. They have become code reviewers. That is genuinely significant. It is not the same as saying Devin ships code autonomously, which is how the 89% figure has been widely interpreted.
The question Cognition has never answered: what percentage of Devin's PRs are merged as submitted versus revised by humans before merging? If engineers are rewriting 40% of Devin's PRs before approval, the "89% committed by Devin" framing describes who types the first draft, not who produced the merged code. If Devin's PRs are merged without modification at high rates, the claim is as strong as it sounds. Cognition has not published this data.
Goldman Sachs CIO Marco Argenti described Devin as "our new employee" with write access to live Goldman codebases across multiple engineering teams. Russell Kaplan from Goldman's platform engineering group has given public talks on deploying Devin at scale. Goldman Sachs is the most documented enterprise customer relationship in the filing. Mercedes-Benz reportedly cut an eight-month legacy modernization project to eight days; Itaú reportedly uses Devin to automatically fix 70% of security vulnerabilities. Both figures come from Cognition's own blog with no independent confirmation from the customers. The confirmed enterprise customer list from Cognition's own announcement: Citi, Mercedes-Benz, Goldman Sachs, Elevance, Dell, Santander, U.S. Army, U.S. Navy, plus Infosys and Cognizant as systems integrators.
What the 13x growth actually measures.
The growth comparison: $37 million ARR in May 2025 → $492 million ARR in May 2026.
The $37 million figure is Devin-only ARR from before Cognition acquired Windsurf. The acquisition closed in July 2025. At the time of acquisition, Windsurf had $82 million in ARR and was growing rapidly — enterprise ARR reportedly doubling quarter over quarter.
The $492 million figure is combined Cognition and Windsurf revenue. Sacra analysis confirms the current ARR includes "per-seat and usage-based subscriptions for Devin and multi-seat enterprise deployments of Windsurf's AI coding platform." Windsurf remains a distinct product with a distinct pricing model.
The growth comparison therefore measures pre-acquisition Devin-only revenue against post-acquisition combined revenue. That is not 13x organic growth. Organic Devin-only growth over the same period is not separately disclosed. The combined entity's $492 million is real; the attribution of that trajectory to Devin product improvements alone is not.
For context on the acquisition: Cognition paid approximately $250 million for Windsurf in July 2025, absorbing roughly 250 engineers. The current employee count at Cognition is estimated between 305 and 407 people — the headcount roughly doubled from the acquisition. The revenue story and the M&A story are inseparable.
The benchmark paradox.
Devin 2.0's last published SWE-bench Verified score: 45.8%. That score is from Cognition's own published evaluation. Cognition has not submitted updated model scores to the BenchLM SWE-bench Verified leaderboard, which tracks 53 models as of June 2026.
The current leaderboard is led by Claude Mythos 5 at 95.5% and Claude Fable 5 at 95%. Claude Code — Anthropic's coding-focused interface — scores approximately 87.6%. OpenHands running Claude Sonnet 4.5 scores 72%. Devin's last published score, 45.8%, is roughly half the current leaderboard leaders.
Cognition is generating $492 million in annualized revenue with a product that benchmarks at approximately half the capability level of its most direct technical competitors. Enterprise buyers are choosing Devin despite — or perhaps because of — its benchmark position.
The explanation is structural. Devin and Claude Code are different products targeting different workflows. Devin's architecture is autonomous-agent-first: hand it a GitHub issue, it opens a branch, works independently for as long as necessary, returns with a complete PR. Claude Code operates in-loop: the human is at the terminal, the AI assists turn by turn with full judgment involvement. These are different models for different tasks at different price points.
Devin targets mechanical, multi-file, long-horizon work that engineers would rather not do: dependency updates, refactoring legacy code, fixing known bugs, running security patches across repositories. Benchmark scores measure whether an AI can solve a pre-specified GitHub issue in a controlled evaluation. They do not measure whether an AI is acceptable at sitting in a PR queue at Goldman Sachs for months without causing an incident. Enterprise buyers have decided those are different capabilities and priced them differently.
Scott Wu launched Devin in March 2024 with a ~13% SWE-bench score and demo videos that were widely criticized — YouTube channels documented Devin failing to execute tasks shown in the official demo. Two years later, Devin 2.0 scores 45.8% and the combined entity has $492 million in annualized revenue. The product has improved; the benchmark position relative to frontier coding agents has not closed; the enterprise market has grown substantially anyway.
The valuation problem.
At $492 million ARR and $26 billion post-money valuation, Cognition trades at approximately 52.8 times revenue. Cloudflare trades at approximately 30.5 times; CrowdStrike at approximately 21.7 times. Independent analysis suggests Cognition needs $1.7 to $2.6 billion in ARR for current multiples to compress to historically supportable territory.
Devin's compute-intensive agent operations mean gross margins are estimated at 30 to 60 percent, against 75 to 90 percent for typical enterprise SaaS. The ACU pricing model — Agent Compute Units, approximately 15 minutes of autonomous work each, at $2.00 per unit for Team tier — means compute costs scale directly with usage. A serious refactor costs 30-plus ACUs; that is $67.50 in compute cost to Cognition before margin. At scale, the revenue story and the cost structure story are in tension in a way that pure-SaaS multiples do not capture.
The $1 billion Series D targets $1 billion ARR by end of 2026. At $492 million in May 2026, reaching $1 billion by December requires roughly 103% growth in seven months. At the stated 50% month-over-month enterprise growth rate, that target is mathematically achievable — and also almost certainly unsustainable at this revenue base. The $1 billion ARR figure is company-stated in media; no direct Scott Wu quote with that specific target has been confirmed.
What Wu says Devin is vs. what the $26B implies.
Wu's stated position, in a TechCrunch interview on May 29, 2026: "We've never thought about it as replacing humans." Cognition's homepage calls Devin "the AI software engineer." Goldman Sachs calls it "our new employee." The fundraising materials and enterprise sales narratives are built on the premise that Devin can replace junior engineering headcount — that is the economic case for a $26 billion valuation.
These two messages coexist because they serve different audiences. Wu's "buddy" framing minimizes buyer resistance from engineering teams and regulatory attention from governments examining AI's effect on employment. The investor and enterprise buyer framing of headcount replacement is what justifies the multiple. Both framings are being deployed simultaneously, and they point in opposite directions.
The 89% of code committed by Devin figure, correctly understood — 89% of PRs opened by an AI agent, reviewed by humans before merge, with no data on revision rates — is a genuine milestone in how software is being produced commercially. It is not autonomous deployment. Whether it will ever become autonomous deployment is the question that Cognition's $26 billion valuation is implicitly answering.
- https://cognition.com/blog/series-d
- https://techcrunch.com/2026/05/27/ai-coding-startup-cognition-raises-1b-at-25b-pre-money-valuation/
- https://techcrunch.com/2026/05/29/cognitions-scott-wu-says-ai-coding-agents-shouldnt-replace-humans/
- https://sacra.com/c/cognition/
- https://www.cnbc.com/2025/07/14/cognition-to-buy-ai-startup-windsurf-days-after-google-poached-ceo.html
- https://benchlm.ai/benchmarks/sweVerified
- https://newmarketpitch.com/blogs/news/ai-code-assistant-is-cognition-overvalued