DeepMind Launches Genie 3: Laying a New Foundation for AGI

August 6, 2025 | Zoey

On the road to artificial general intelligence (AGI), world models have long been considered critical infrastructure. In 2025, Google DeepMind released a new foundational world model, Genie 3. This powerful model can generate interactive virtual worlds in real time. Its emergence may open the door for AI to understand the world like humans do.

Genie 3 

(Image source: Genie 3)

What is Genie 3?

Genie 3 is DeepMind's first real-time interactive general-purpose world model. Its core capability is to generate 3D environments at a resolution of up to 720p and 24 frames per second based on simple text prompts, and to support real-time interactions for minutes.

Shlomi Fruchter, DeepMind's research director, said at the press conference:

"Genie 3 is no longer limited to narrow gaming environments or single tasks. It can generate photorealistic real-world, virtual worlds, and even hybrids in between. We believe it will be an important building block on the path to general intelligence."

Genie 3 is currently in research preview and has not yet been publicly released. It builds upon DeepMind's earlier model, Genie 2, which generated new environments for intelligent agents, and its latest video generation model, Veo 3, which incorporates a deep understanding of physical laws.

The breakthrough of this generation of models lies in their integration of environment construction, physical reasoning, and real-time interaction into a single, powerful system.

Technological Leaps: Longer, Realer, Smarter

  • Longer generation times: Supports continuous generation over several minutes, rather than just a dozen seconds.
  • More realistic graphics: 720p resolution at 24fps, nearly lifelike.
  • Controllable world events: Allows users to modify environment content and dynamics through prompts.
  • Physically consistent world: The model remembers and references previously generated content.

These features make Genie 3 more than just a model that "generates animations," but more like a virtual world engine capable of long-term reasoning and interaction.

Autoregressive Architectures: The Secret to a Keeping the World Together

Genie 3 doesn't rely on a traditional physics engine to drive object movement. Instead, it uses an autoregressive architecture that generates content one frame at a time and infers what should happen in the next frame by "looking back" at previously generated frames.

As Fruchter explained in an interview:

It has to look back at what it has generated before to decide what to do next. That's a key part of the Genie 3 architecture.

This mechanism imbues Genie 3’s simulated world with temporal continuity and logical consistency. For example, when a glass slides toward the edge of a table, it doesn’t suddenly jump back to where it started. Instead, it approaches the edge frame by frame, as if about to fall—just as naturally as humans understand physics in the real world.

Self-taught physics: not relying on code, relying on experience

Even more surprising is that these physical laws aren't artificially pre-programmed. DeepMind emphasizes that they didn't hard-code any physics rules for Genie 3. Instead, the model learned the logic of the world—how objects move, fall, collide, and bounce—by watching video data, generating and receiving feedback over time.

"We didn't tell it what gravity was; it learned it on its own."

This means that Genie 3 is developing an ability to understand the world similar to that of human toddlers: building an intuitive understanding of the laws of the world through observation and interaction.

Teaching AI to live in the world

To validate Genie 3's effectiveness, DeepMind combined it with its own multi-world agent, SIMA, and conducted a series of task tests. In a warehouse scenario, SIMA was instructed to complete the following goals:

  • "Approaching the bright green trash compactor"
  • "Walking toward the loaded red forklift"

DeepMind scientist Jack Parker-Holder said SIMA was able to complete its tasks because Genie 3 provided a world that remained consistent throughout.

This is exactly what Genie 3 was designed for: to provide intelligent agents with a real, coherent, and operational world training space, helping them to truly acquire perception and reasoning capabilities.

From reaction to exploration: Making AI grow like humans

A key significance of Genie 3 is that it allows agents to not only react to input but also actively explore their environments, formulate plans, iterate through trial and error, and adjust their strategies.

Perhaps this is one of the most important capabilities of future intelligent agents: knowledge acquired not by being fed data or given instructions, but by growing gradually through real interactions with the world, just as humans do. DeepMind believes that the ability to learn through one's own experience is the key to true intelligence.

Bottlenecks that Genie 3 still needs to break through

While Genie 3 has demonstrated many exciting capabilities, it still has a lot of room for improvement. For example, its performance in some physical details is still not perfect—for example, the scene of a skier rushing down the slope looks smooth, but the dynamics of the snow itself and the relative relationship between the person and the ground are still a bit awkward. In addition, although you can use prompts to make some changes in the world, these changes are often not made by the agent itself, but triggered by "external forces." Moreover, this model's performance is not natural enough when dealing with multiple characters appearing in the same scene at the same time and interacting with each other, and it is still a long way from true "swarm intelligence".

These limitations mean that Genie 3 still cannot independently support training for complex, long-term missions, but it is undoubtedly heading in the right direction and has great potential.

AlphaGo Move 37: It's Not Yet Here

DeepMind researchers say the breakthrough moment for intelligent agents is yet to come. Parker-Holder noted:

We havent seen an agent actually perform AlphaGos move 37 in the real world.

The so-called "Move 37" refers to a stunning and unexpected move made by AlphaGo in the 2016 match against Lee Sedol. This moment symbolized AI's ability to demonstrate strategic creativity unimagined by humans.

Perhaps Genie 3 is preparing for such a miracle.

Final Thoughts

The birth of Genie 3 marks a new stage in world modeling. It not only enables the construction of rich virtual spaces, but more importantly:

It gives intelligent agents a way to learn in symbiosis with the environment, helping AI truly "live in the world."

Whether used for gaming, creative prototyping, immersive learning, or the more ambitious goal of training truly general AI agents, Genie 3 is a milestone worth remembering.

Perhaps the "37th step" toward AGI lies within a single frame being generated by Genie 3.

Viddo AI Logo

Viddo AI is an advanced AI-powered video generation platform that transforms text or images into high-quality, cinematic videos-no editing skills required.

© 2025 viddo.ai, Inc. All rights reserved.