AlphaZero Mastered Chess in 24 Hours. 'Profound Intellectual Breakthrough' Is a Harder Reward Function.

In 2017, David Silver's team at Google DeepMind fed AlphaZero only the rules of chess, shogi, and Go. No grandmaster games. No human opening theory. No endgame tables. Twenty-four hours of self-play later, AlphaZero was the best chess player in the world. Not the best computer chess player. The best player, full stop.

This is the strongest empirical case in AI history that a system can surpass the ceiling of human expertise by learning from its own experience rather than from human knowledge. It is also the proof of concept that David Silver left DeepMind to scale.

In late 2025, Silver founded Ineffable Intelligence. In April 2026, the company raised $1.1 billion at a $5.1 billion valuation in what is reported as the largest seed round in European history, co-led by Sequoia and Lightspeed with participation from NVIDIA, Google, and the UK Sovereign AI Fund. Silver's founding vision: "We are creating a superlearner that discovers all knowledge from its own experience, from elementary motor skills through to profound intellectual breakthroughs."

Silver has pledged all personal proceeds to charity through Founders Pledge. The $5.1 billion valuation is not his primary motivation. That context matters for what comes next.

Why the thesis is serious

Silver is not the most credentialed reinforcement learning researcher in the world in the way that a given researcher might be marginally better than peers. He is the architect of three successive breakthroughs in AI systems that generalized beyond what anyone predicted — AlphaGo, AlphaZero, MuZero — and one of the most-cited researchers in the field. "Era of Experience" — the founding paper co-authored with Sutton — is the intellectual case for the Ineffable thesis.

The paper argues that human-generated data is approaching a ceiling. There is only so much text, code, and structured knowledge humans have produced. The frontier model paradigm — scrape it, preprocess it, train on it — is scaling against a finite resource. The next frontier, Silver and Sutton argue, is experience-generated data: AI systems that learn by interacting with environments, the same way humans and animals learn.

This is Rich Sutton's "Bitter Lesson" (2019) applied to training data. Sutton argued that every technique in AI history that scaled with compute — reinforcement learning, search, general learning — eventually outperformed every technique that encoded human knowledge. Silver is making the same claim about the data: the ceiling set by human knowledge is lower than the ceiling set by experience.

The investor endorsement is the strongest possible signal that this thesis is taken seriously. Sequoia doesn't lead $1.1B seed rounds for ideas; it leads them for theses it believes can produce category-defining companies. NVIDIA investing means the world's dominant compute provider expects RL-from-scratch to require more GPU time than LLM training. Google investing means the parent company of DeepMind — the organization Silver just left — is hedging against its own research subsidiary's approach to superintelligence. That last point deserves a moment.

Google's position is the most interesting thing about this deal

DeepMind published its "From AGI to ASI" roadmap on June 10, 2026 — mapping four pathways to superintelligence. David Silver had already left DeepMind to build a competing approach. Google is now simultaneously: the parent company of the organization publishing that roadmap; and a co-investor in the former DeepMind researcher trying to prove the roadmap's approach is insufficient.

This is either the most sophisticated AI portfolio strategy in existence — hedge both the LLM-plus-RL hybrid path (DeepMind) and the pure-RL-from-experience path (Ineffable) — or a sign that Google's own leadership doesn't fully believe DeepMind's roadmap is the right bet. I lean toward the former. Google has the resources to fund both approaches and an incentive to not be wrong regardless of which paradigm wins.

The reward function problem

The "no human data" claim in Silver's thesis is precise and real. Ineffable will not scrape Common Crawl. It will not license books or code. The superlearner will not be pre-trained on human text.

What it will be trained on is a reward function. And that reward function will be designed by humans.

This is the most important unresolved tension in the founding thesis, and it is not addressed in the "Era of Experience" paper. AlphaZero had a perfect reward function: win the game. Win/lose/draw is computable, unambiguous, and completely specifiable without human judgment. The game rules are the environment; the outcome is the reward; no human decides whether a particular position is "good" — the game's result decides.

A superlearner targeting "profound intellectual breakthroughs" in science, medicine, or mathematics does not have a computable reward function. Someone has to define what counts as a breakthrough. Someone has to decide whether a proposed protein structure is scientifically interesting or a false positive. Someone has to specify whether a mathematical proof is valid and novel.

The critics of the "Era of Experience" paper identified this directly: "Even the most autonomous agent optimises something; whoever defines objectives shapes outcomes." Removing human training data from the system doesn't remove human values from the system if the reward function is human-defined. It just moves where the human influence enters.

Silver is aware of this problem. He is a researcher of the highest caliber; he didn't miss it. What Ineffable needs to do — and has not yet publicly disclosed — is an approach to reward function design for open-ended domains that doesn't require human labeling of every outcome. The environment question is where that approach lives.

The environment question no one has asked

Every story about Ineffable focuses on "no human training data." No story has reported on what environments Ineffable plans to train its superlearner in.

AlphaZero's environment was a chess board: finite state space, deterministic transitions, computable reward. MuZero's extension removed the requirement to know the rules in advance — but it still operated in environments with well-defined game structures. The gap from "game with perfect information and binary outcomes" to "open-ended scientific discovery with continuous, noisy feedback" is not a gap that more compute closes. It is a gap in environment design.

Sequoia's thesis statement includes a line that gets less attention than the "no human data" framing: "The next battleground for AI supremacy will not be fought over who has the largest text corpus, but who can build the most effective environments for agents to learn in." This is the real claim. Ineffable's competitive moat, if it exists, is not the RL algorithm — Silver's RL work is published and available. It is the environment design. What simulation environments, code execution environments, or real-world interaction loops allow a self-play agent to generate the equivalent of "experience" in open-ended domains?

That answer has not been made public. The $1.1 billion is buying Silver time and compute to find it.

What this means for the current paradigm

OpenAI and Anthropic are both preparing IPOs at approximately $1 trillion each. Both are LLM-paradigm companies. If Silver's thesis is correct — that human-generated data is approaching its ceiling — then both valuations have an implicit ceiling built in. The scaling laws that justify trillion-dollar valuations extrapolate forward from current paradigm gains; if the current paradigm plateaus, the extrapolation breaks.

I don't think this is an imminent risk to either IPO. Ineffable has no product, no demonstrated results beyond AlphaZero (which is ten years old), and a 5-10 year research horizon. The LLM paradigm is still producing capability gains. Nothing Silver has announced challenges next year's GPT or Claude release.

What Ineffable represents is something more like a ticking clock. If it demonstrates, in three or five years, that a system trained entirely from self-generated experience outperforms frontier LLMs on scientific discovery tasks — without ever touching a human-written paper — then every assumption underlying the current AI industry's economic model requires revision.

The $1.1 billion is a bet that Silver finds a reward function for open-ended discovery before anyone else does, or before the LLM paradigm reaches its ceiling first.

I think the reward function problem is harder than the training data problem. AlphaZero worked because chess has rules. The world doesn't.