New research introduces a revolutionary AI model designed like the human brain — and it’s already surpassing leading LLMs at reasoning tasks.
The Next Leap in AI: Beyond ChatGPT and LLMs
For years, large language models (LLMs) like ChatGPT, Claude, and DeepSeek have dominated the AI space, wowing users with their conversational fluency and problem-solving abilities. But while these models are remarkable, they struggle with one of the most human-like capabilities: true reasoning.
Now, scientists from Sapient, an AI company based in Singapore, have unveiled a groundbreaking system called the Hierarchical Reasoning Model (HRM). Inspired directly by the architecture of the human brain, HRM processes information differently than LLMs — and in early benchmarks, it has outperformed some of the most powerful AI systems in the world.
This breakthrough suggests that the future of AI may not lie in simply scaling up LLMs with trillions of parameters, but in rethinking how machines “think.”
What Is the Hierarchical Reasoning Model (HRM)?
Unlike LLMs, which primarily rely on predicting text based on patterns in massive training datasets, the HRM system is designed to reason through problems step by step in a way that resembles human cognition.
The key inspiration comes from hierarchical and multi-timescale processing in the brain. In humans:
-
Some regions handle fast, detailed responses (milliseconds).
-
Others focus on slower, abstract planning (minutes or longer).
HRM mimics this architecture with two main modules:
-
High-level module → Manages abstract planning and long-term reasoning.
-
Low-level module → Handles rapid, fine-grained computations.
By combining these layers, HRM can analyze problems at multiple depths simultaneously, achieving a richer and more flexible reasoning process.
How HRM Compares to Traditional LLMs
Parameter Efficiency
-
HRM: 27 million parameters, trained with just 1,000 samples.
-
LLMs like GPT-5: estimated 3–5 trillion parameters.
Despite being thousands of times smaller, HRM has outperformed major LLMs in reasoning benchmarks. This efficiency could reshape the economics of AI training, making powerful systems more accessible without billion-dollar computing budgets.
Performance on ARC-AGI Benchmark
The ARC-AGI test is designed to measure how close AI systems are to Artificial General Intelligence (AGI) — the ability to reason, adapt, and solve novel problems beyond memorization. It is considered one of the toughest exams for machines.
Here’s how HRM scored:
-
ARC-AGI-1: HRM scored 40.3%, compared with
-
34.5% for OpenAI’s o3-mini-high,
-
21.2% for Anthropic’s Claude 3.7,
-
15.8% for DeepSeek R1.
-
-
ARC-AGI-2 (harder version): HRM scored 5%, versus
-
3% for o3-mini-high,
-
1.3% for DeepSeek R1,
-
0.9% for Claude 3.7.
-
These results are remarkable, given HRM’s much smaller scale.
Different From Chain-of-Thought Reasoning
Most LLMs use a method called Chain-of-Thought (CoT) reasoning, where complex problems are broken down into simpler steps expressed in text. This helps models "think aloud," but it has limitations:
-
Brittle decomposition: if one step is wrong, the whole chain collapses.
-
Heavy data dependence: requires massive training datasets.
-
High latency: reasoning is slow due to step-by-step processing.
HRM, by contrast, avoids these pitfalls by executing reasoning tasks in a single forward pass, without explicitly supervising intermediate steps. Instead, it relies on iterative refinement — running multiple short bursts of “thinking” until the most accurate solution emerges.
Impressive Problem-Solving Abilities
The HRM model has demonstrated abilities that challenge even the best LLMs:
-
Complex Sudoku puzzles: HRM solved puzzles that LLMs consistently fail at, thanks to its structured reasoning.
-
Maze path-finding: HRM excelled at identifying optimal paths in maze-like problems.
-
Abstract reasoning: Outperformed ChatGPT and Claude in multi-step logic tests.
These achievements suggest HRM is not just a smaller, cheaper model — but a fundamentally different kind of AI.
Why Brain-Inspired AI Matters
The development of HRM represents a paradigm shift in AI research. For years, progress has been fueled by scaling up LLMs: more parameters, more data, bigger compute. But this arms race is unsustainable — both financially and environmentally.
Brain-inspired models like HRM prove that intelligence isn’t about size, but structure. Just as the human brain achieves incredible reasoning with only 86 billion neurons, HRM achieves remarkable results with a fraction of the resources used by today’s LLM giants.
This opens up exciting possibilities:
-
More efficient AI → Smaller models that require less energy and data.
-
Human-like reasoning → Models that think in structured, adaptive ways.
-
Wider accessibility → Advanced AI tools that don’t require trillion-dollar infrastructure.
Skepticism and Ongoing Research
It’s worth noting that HRM’s research is still preliminary. The study, published on arXiv in June 2025, has not yet been peer-reviewed. While the results are promising, independent verification is essential.
Interestingly, the ARC-AGI benchmark organizers tried to replicate the findings after the researchers open-sourced HRM on GitHub. They confirmed the performance numbers but noted a surprising twist:
-
The hierarchical architecture itself had less impact than expected.
-
The real boost came from a refinement process during training, which wasn’t fully documented.
This raises questions about whether HRM’s success comes from its brain-like design or clever training strategies.
Implications for the Future of AI
If HRM lives up to its promise, it could reshape the entire AI landscape:
-
Redefining AGI Pathways → Instead of endlessly scaling LLMs, researchers may shift toward cognitive-inspired architectures.
-
Smarter Problem-Solving Tools → AI that can reason deeply, not just predict text, could revolutionize science, medicine, engineering, and law.
-
Ethical Opportunities → Smaller, efficient models reduce environmental impact and democratize access, making advanced AI available to smaller labs, startups, and developing nations.
-
Challenges for Big Tech → Companies like OpenAI, Anthropic, and Google may need to rethink their reliance on parameter scaling.
Conclusion: A New Era of Human-Like AI
The creation of the Hierarchical Reasoning Model marks a turning point in AI research. By mimicking the multi-layered processing of the human brain, HRM demonstrates that smaller, smarter systems can outperform trillion-parameter giants in reasoning tasks.
While still in its early stages and awaiting peer review, HRM’s success hints at a new direction for AI: one where architecture and reasoning matter more than raw size.
As AI continues to advance toward Artificial General Intelligence, brain-inspired approaches like HRM may hold the key to building systems that don’t just talk like humans — but think like them too.