Google DeepMind researchers have unveiled AlphaGeometry2 (AG2), an advanced artificial intelligence (AI) system that has demonstrated superior performance in solving geometry problems compared to International Mathematical Olympiad (IMO) gold medalists.
This sophisticated AI framework achieved an impressive 84% success rate in tackling geometry problems from the IMO, exceeding the average success rate of 81.8% for gold medal winners.
Engineered by Google DeepMind, AG2 represents a leap forward in AI’s problem-solving capabilities, demonstrating not only pattern matching but also creative reasoning, according to the researchers. Their findings were published in a study on the preprint arXiv database on February 7th.
This announcement comes shortly after Microsoft released its own AI math reasoning system, rStar-Math. While both companies are vying for dominance in the AI math domain, AG2 distinguishes itself through its hybrid reasoning model, designed for solving advanced problems, whereas rStar-Math employs small language models to address a wider range of problems.
The original AlphaGeometry was released in January 2024, and the new version, AG2, shows a 30% performance improvement over previous iterations. The enhancements in AG2 are focused on geometry, a domain that requires a blend of visual reasoning and logical deduction.
Experts have cautioned against interpreting this as a leap towards artificial general intelligence (AGI), where an AI system surpasses human capabilities across multiple disciplines. However, this does represent a significant advancement.
“AlphaGeometry2 represents a form of intelligence, but human intelligence goes far beyond this — we invent, rather than simply apply knowledge or create the illusion of thought,” said John Bates, CEO of AI company SER Group and a computer science expert from the University of Cambridge.
DeepMind’s achievement lies in the successful integration of neural language models and symbolic engines—logic-based systems. The language model suggests potential geometric constructions, which are then tested by the symbolic engine. This approach enables the system to translate human-readable geometry problems into “auxiliary constructions” that the symbolic engine can process.
The system iterates by proposing new constructions if the initial attempts fail. This search for solutions operates in parallel, with information shared between the two components until a solution is found.
Compared to its predecessor, AG2 benefits from a neural language model trained on a more expansive and diverse data set and a faster symbolic engine that can evaluate geometric constructions more efficiently. The system also features a unique algorithm for proof discovery.
While AG2 has demonstrated remarkable capabilities , it has limitations. The system can be slow, struggles with complex IMO geometry problems in 3D, nonlinear equations, and problems involving variable or infinite points. It also cannot explain its solution process in a human-understandable language.
DeepMind’s aspiration is to enhance mathematical reasoning, which can have broader applications across various fields including engineering design, automated systems verification, robotics, pharmaceutical research, and genomics. The ultimate goal is for AG2 to achieve fully automated, error-free geometry problem-solving. Future iterations will focus on expanding its support for a wider range of geometric concepts and breaking down problems into smaller, manageable subgroups.