DeepMind’s CEO Demis Hassabis announces future merger of Gemini and Veo AI models at Google.

In a recent podcast episode hosted by LinkedIn co-founder Reid Hoffman, Google DeepMind CEO Demis Hassabis discussed the company’s plans to enhance its AI capabilities by merging its Gemini models with its Veo video-generating technology. This integration aims to improve Gemini’s comprehension of the physical world.

Hassabis emphasized that they designed Gemini with a multimodal approach from the outset, envisioning a universal digital assistant that can genuinely help users in real-life scenarios. The AI industry is gradually evolving towards “omni” models capable of understanding and synthesizing various media formats.

Google’s latest Gemini models can now generate audio, images, and text, while OpenAI’s ChatGPT has started to produce images, including visual styles reminiscent of Studio Ghibli. Additionally, Amazon has announced its intentions to launch an “any-to-any” model that will further enhance the capabilities of multimodal AI.

Training these omni models necessitates a significant amount of diverse data, including images, videos, audio, and text. Hassabis revealed that much of the video data for Veo primarily originates from YouTube, a platform owned by Google.

He noted that by analyzing numerous YouTube videos, Veo can gain insights into the physics of the world. In its communications, Google acknowledged that its AI models may utilize some content from YouTube as per its agreements with content creators.

The company reportedly expanded its terms of service last year, allowing it to access more data for training its AI models. This development suggests a strategic move to enhance the effectiveness of its AI initiatives while adhering to legal and ethical considerations in data usage.

Leave a Reply Cancel reply