Google DeepMind’s CEO, Demis Hassabis, recently showcased the company’s latest breakthrough in artificial intelligence technology, Genie 2, a world-building AI model that can create interactive 3D environments from a single static image. This development was featured on 60 Minutes, where correspondent Scott Pelley took Astra, Google DeepMind’s AI assistant, for a test drive on the streets of London.
Astra, equipped with cameras and microphones, demonstrated its ability to understand and describe its surroundings. When Pelley asked Astra about a building he was looking at, it correctly identified it as the Coal Drops Yard, a shopping and dining district. Astra also recognized a painting by Edward Hopper, “Automat,” and analyzed the emotions expressed by the subject, describing her as “pensive and contemplative” with “a sense of solitude.”
The 60 Minutes segment also highlighted advancements in generative AI, particularly with Veo 2, a video-generating AI model. A demonstration showed how Veo 2 could create a photorealistic video of a golden retriever puppy with wings running through a field, a significant improvement from its predecessors.
Genie 2’s capabilities were further demonstrated by creating a 3D world from a static image of a waterfall in California. The AI-generated environment allowed for interactive exploration, with the avatar walking around the pool at the top of the waterfall and discovering new features not present in the original image. Hassabis explained that these simulated environments could be used to train AI agents to perform tasks and potentially be applied to robotics.
The potential implications of Genie 2 are vast, ranging from entertainment and game development to enhancing AI’s understanding of the world. Hassabis noted that future versions could create an infinite variety of simulated environments for AI training, which could be particularly beneficial for robotics by allowing for the collection of vast amounts of data in simulated worlds before fine-tuning with real-world data.
Moreover, Google’s existing geographic data from Google Earth, Google Maps, and Google Street View could be utilized to give AI systems real-world understanding and geographical knowledge. This technology could also bring static images to life, making them interactive and 3D.