AI models are starting to solve high-level math problems

Over the weekend, Neel Somani, a software engineer, former quant researcher, and startup founder, was testing his math skills on OpenAI’s new model when he made an unexpected discovery. After pasting the problem into ChatGPT and waiting for 15 minutes, it came back to the full solution. They evaluated the evidence and formalized it with a tool called Harmonic – but everything was checked.

“I’m curious to establish a baseline when LLMs can effectively solve open math problems compared to where they struggle,” Somani said. What is surprising is that, using the latest model, the border starts to move forward a bit.

ChatGPT’s chain of thought even more impressive, rattling off mathematical axioms like Legendre’s formula, Bertrand’s postulateand Star of David theorem. Finally, the model was found Math Overflow post from 2013where Harvard mathematician Noam Elkies has provided an elegant solution to the same problem. But ChatGPT’s final proof differs from Elkies’s work in an important way, and provides a more complete solution to a version of the problem posed by the legendary mathematician Paul Erdős, whose large collection of unsolved problems has served as a proof for AI.

For any skeptic of machine intelligence, this is a surprising result – and not the only one. AI tools already exist in mathematics, from formalization-oriented LLMs like Harmonic’s Aristotle to literature review tools like OpenAI’s deep research. But since the release of GPT 5.2 – which Somani described as “anecdotally more skilled in mathematical reasoning than previous iterations” – the sheer volume of problems solved has become difficult to ignore, raising new questions about the ability of large language models to push the boundaries of human knowledge.

Somani examines the Erdős problem, a set of more than a thousand conjectures by the Hungarian mathematician who maintained online. Problems have become tempting targets for AI-driven mathematics, varying significantly in subject matter and difficulty. The first batch of autonomous solutions arrived in November from The Gemini-powered model is called AlphaEvolve – but more recently, Somani and others have found GPT 5.2 to be very good with high-level math.

Since Christmas, 15 problems have been moved from “open” to “solved” on Erdős’ website – and 11 of the solutions specifically acknowledge the AI models involved in the process.

The respected mathematician Terence Tao has a better view of the progress on his GitHub pagecount eight different problems where AI models make significant autonomous progress on Erdős problems, with another six cases where progress is made by discovering and building on previous research. It is a long way from an AI system that can do math without human intervention, but it is clear that there is an important role for large models.

Techcrunch event

San Francisco
|
13-15 October 2026

In MastodonTao thinks that scalable AI systems make them “more suitable for systematic application to the ‘long tail’ of Erdős’ ill-defined problems, many of which have straightforward solutions.”

“Thus, many of these simpler Erdős problems are now easier to solve with AI-based methods than with human or hybrid methods,” Tao said.

Another driving force is the recent shift toward formalization, labor-intensive tasks that make mathematical reasoning easier to verify and extend. Formalization doesn’t require AI or even computers, but a new crop of automated tools has made the process easier. Lean’s open-source “proof assistant,” developed at Microsoft Research in 2013, is already widely used in the field as a way to create formal proofs — and AI tools like Harmonic’s Aristotle promise to automate much of the formalization work.

For Harmonic founder Tudor Achim, the sudden jump in the solved Erdős problem is less important than the fact that the world’s greatest mathematicians are starting to take the tool seriously. “I care more about the fact that math and computer science professors are using (AI tools),” Achim said. “These people have a reputation to protect, so when they say they use Aristotle or use ChatGPT, that’s real evidence.”

Source link

Leave a ReplyCancel Reply