Google DeepMind CEO Demis Hassabis outlined DeepMind's research direction, advancements in technologies such as multimodal and world models, and discussed the development of AI agents and related risks during the closing conversation at the 'AI+SF Summit' held by Axios in San Francisco. He also evaluated the US-China AI competitive landscape and rarely provided a timeline estimate for AGI, believing that it is about 5 to 10 years away from 'AI systems with human cognitive abilities'.

Nobel aura blessing, scientist thinking dominates DeepMind

At the beginning of the event, host Mike Allen introduced Hassabis as a chess prodigy at age 5 and a Nobel Prize winner at age 48. Hassabis admitted that even after winning the award, it still feels surreal, but the actual impact is very evident.

Because when he talks with government officials or cross-sector decision-makers who are not familiar with AI, the 'Nobel Prize' acts like a key that can quickly open any door, making the other party more willing to listen to him discuss topics like AI safety and responsible use. He also plans to make more active use of this title in the future.

When talking about daily work and management style, Hassabis emphasizes that he is 'always a scientist first, CEO second.' In his view, the scientific method is one of humanity's most important inventions, and he applies the process of 'formulating hypotheses, designing experiments, and updating views based on results' directly to product development and organizational management.

DeepMind's advantages come from three simultaneous aspects: 'world-class research, world-class engineering capabilities, and world-class computing infrastructure.' He believes that only by synchronously advancing these three aspects can DeepMind qualify to be at the forefront of AI development.

Plans for the next 12 months: multimodal evolution, world models, and agents.

When discussing the specific developments of AI in the next 12 months, Hassabis points out that Gemini was designed from the start as a multimodal model, capable of processing text, images, videos, and audio simultaneously. The next focus is on what new capabilities will be unlocked after further 'integration' between these modalities. He cites as an example that their latest image model 'Nano Banana Pro' can produce very accurate information graphics, showing that the model's visual understanding capabilities are rapidly improving.

The second point is world models. DeepMind's Genie 3 can generate interactive videos, allowing users not just to watch the video but to walk into the scene as if entering a game, while maintaining consistency and coherence in the world for about a minute. Such models are seen as a key step for AI to understand the reality of the world and its rules.

The third point is AI agents. Hassabis admits that current AI agents cannot yet be trusted to handle a whole package of tasks directly, guaranteeing a good outcome from start to finish. However, he anticipates that in a year, the trustworthiness of AI agents will definitely increase. Google's goal is to make Gemini a 'universal assistant' that not only exists on phones and computers but can also be present with users through wearable devices like glasses, becoming a regular assistant in daily life and work.

(Tested: Gemini 3 Nano Banana Pro automatically generates humorous comics after thinking, turning Trump back into a youthful figure)

The future holds promise for advancing space exploration, but safety risks and video comprehension are equally crucial.

When it comes to the best scenarios that AI can bring, Hassabis hypothesizes that AI can help humanity overcome several key bottlenecks, such as nuclear fusion, new types of batteries, breakthroughs in materials science and semiconductors, and solutions to major diseases. Human society will have the opportunity to advance toward space exploration in a context of more abundant resources.

But he also points out the worst-case scenarios, divided into several levels:

  1. Malicious individuals are using AI to design or enhance pathogens.

  2. AI accelerates foreign forces' cyberattacks on critical infrastructure such as energy and water resources, and such things may already be happening, albeit with AI that is not yet advanced.

  3. Highly autonomous AI agents deviate from original instructions and human expectations, thus significant resources and attention must be invested to prevent this.

In terms of capabilities, he believes the most underestimated area by the outside world is AI's deep understanding of videos. Hassabis shares that he had Gemini analyze a movie clip, and the model not only understood the visuals but also provided quite profound symbolic and emotional interpretations, rather than just staying at the level of describing surface actions.

He also mentions Gemini Live, which can provide repair assistance in real-time just by pointing a phone camera at mechanical equipment. However, he believes that the truly ideal vehicle would be glasses, as hands must be free during on-site operations to work while interacting with AI.

The US-China gap is now just a few months, and AGI still requires one to two breakthroughs.

When discussing international competition, Hassabis believes that from the perspective of model capability and innovation, the US and the West are still ahead of China overall, but the latest batch of models from China, such as DeepSeek, are already very strong, mostly belonging to a rapid catch-up based on existing technology. He assesses that the previous lead of the US and the West in AI may have been measured in years, but now the lead over China is likely reduced to just a few months.

Hassabis's definition of AGI is quite clear, which is:

'Must possess all major cognitive abilities of humans, including long-term planning, long-term memory, continuous learning, genuine reasoning abilities, and creativity.'

He points out that current LLMs, while possessing capabilities close to top PhD holders in certain fields, still make mistakes in many situations and are still far from true AGI, estimating that it will still take 5 to 10 years. Hassabis adds that even pushing the scale of current LLMs to the limit is still insufficient to cross the AGI threshold; the AI field may still need one or two major technological breakthroughs, similar to the Transformer, that can significantly enhance capabilities in order to truly achieve AGI.

(IBM CEO: The AI industry is a 'difficult to recoup' gamble, with only a 1% chance of LLM successfully creating AGI)

This article features Google DeepMind CEO: AGI is still 5 to 10 years away, with opportunities and risks in AI development occurring simultaneously. Originally appeared in Chain News ABMedia.