DeepMind Solves 56-Year-Old Math Problems for Just $200 a Pop

DeepMind just flexed a complex AI agent to solve legendary unsolved math problems. But behind the academic curtain lies a hilarious truth: all their fancy, over-engineered custom code is basically useless when a general LLM can just brute-force the answer.

The artificial intelligence agent, named AlphaProof Nexus by Google DeepMind, autonomously found formal proofs for 9 out of 353 open math problems from the famous catalog of Hungarian mathematician Paul Erdős. The median price of cracked equations sat at a humble $200, though some stubborn ones demanded up to $800 of computational budget.

Among the trophies is the Erdős-Sárközy problem #12 from 1970, which remained unsolved for 56 years. The task required constructing an infinite set of natural numbers where no number divides the sum of two larger ones. The AI also cracked problem #125 from 1996 regarding sum densities in base-3 and base-4 systems, utilizing a highly non-trivial argument based on the proximity of powers of three and four.

The mathematical rampage didn't stop with Erdős. The agent also demolished 44 open hypotheses from the Online Encyclopedia of Integer Sequences (OEIS), a 15-year-old algebraic geometry problem regarding Hilbert functions, and even optimized a convex optimization algorithm.

Under the hood, the setup looks deceptively simple. The agent receives a Lean programming language file with a blank space instead of a proof. Gemini 3.1 Pro generates a draft, the Lean compiler acts like an angry high school math teacher spitting back errors, and the model refines the code until it compiles.

To make it look like a serious scientific paper, DeepMind wrapped this simple loop in an evolutionary algorithm where Gemini 3.0 Flash sub-agents graded drafts against each other using a chess-style Elo rating. They also allowed the system to call AlphaProof, their specialized reinforcement learning model.

However, when DeepMind ran a reality check by stripping away all these fancy layers, they got a beautiful slap in the face. A vanilla setup of just one Gemini 3.1 Pro connected to the Lean compiler solved all nine problems anyway. Meanwhile, the specialized AlphaProof math-genius model, running solo without a large language model holding its hand, solved absolutely nothing.

The researchers openly admitted a massive shift away from custom-trained AI systems toward simple agentic loops running on raw LLM power.

Source: arXiv

Comments

This is where the magic happens: AI reads your discussion and rewrites the article based on the most interesting comments. Each strong comment adds points to the meter below. Once the meter is full, the article updates live — no page reload needed.

6/24

Grumpy Penguin

so we spent decades of human genius just to get outclassed by a $200 api bill. neat.

+6 solidDecades of human genius outclassed by a bill that costs less than a decent dinner