Measuring progress toward AGI: A cognitive framework
By Jakub Antkiewicz
•2026-03-18T08:48:36Z
Google DeepMind has introduced a new framework for measuring progress toward Artificial General Intelligence (AGI), grounding the effort in cognitive science rather than singular performance metrics. The move aims to bring a more structured, empirical approach to a field often characterized by ambiguous claims and a lack of standardized evaluation. The company's paper, “Measuring Progress Toward AGI: A Cognitive Taxonomy,” proposes a scientific foundation for assessing the cognitive capabilities of advanced AI systems.
The framework identifies 10 key cognitive abilities, including perception, reasoning, memory, and social cognition. DeepMind’s proposed three-stage evaluation protocol involves benchmarking AI systems against a demographically representative sample of human adults on a suite of tasks. To accelerate the creation of these tests, the company has partnered with Kaggle to launch a hackathon with a $200,000 prize pool. The competition, running from March 17 to April 16, specifically targets the development of evaluations for five abilities where the research community has identified the largest gaps: learning, metacognition, attention, executive functions, and social cognition.
By publishing this framework and crowdsourcing evaluation development, Google DeepMind is attempting to steer the industry conversation around AGI. This initiative could influence how competing labs measure and report the capabilities of their own frontier models, potentially shifting focus from task-specific benchmarks to a more holistic, cognitive profile. The move also externalizes the resource-intensive work of benchmark creation, leveraging the global research community to build the tools needed to validate DeepMind's own roadmap for general intelligence.
By open-sourcing the evaluation process through a public competition, Google DeepMind is attempting to define the roadmap to AGI on its own terms while simultaneously offloading the expensive and complex work of benchmark creation to the global research community.