While the first, called Imagen Video, focuses on quality, the second, Phenaki, helps create longer videos.
A week after Meta, Google responds with its artificial intelligence (AI) systems capable of transforming texts into videos. The Californian company has just unveiled Imagen Video which, as its name suggests, relies on the techniques of its image-generating AI, Imagen. From a simple text description, she manages to create a 24 frames per second video displaying a resolution of 1280 x 768 pixels. The team behind its development even claims that the AI is capable of “create high-definition video with high image fidelity, high temporal consistency, and deep understanding of language.” She is also said to have the ability to create videos and text animations in various art styles.
To achieve this, Imagen Video was trained with an internal data set comprising 14 million video and text pairs and 60 million text and image pairs. The system was also trained using the publicly available LAION-400M dataset of image and text pairs.
Like Meta, Google believes that video-generating AIs can be useful in terms of creativity, but the company recognizes that they are susceptible to malicious exploitation, “for example, to generate false, hateful, explicit or harmful content”. Several measures have therefore been taken to “minimize these concerns”. During internal testing, the research team applied filters to text descriptions and the resulting videos. “Our internal testing suggests that much violent and explicit content can be filtered out”she says, admitting all the same that there are still social prejudices and stereotypes that are difficult to detect and filter. Considering that several significant security and ethical challenges remain, Google has decided not to make Imagen Video publicly available until these concerns have been addressed.
These fears are also present with Phenaki, the firm’s other AI capable of generating videos. While Imagen Video prioritizes image quality, it focuses on length, being able to generate longer videos with detailed text suggestion like “A photorealistic teddy bear swims in the ocean in San Francisco. The teddy bear goes underwater. The teddy bear continues to swim underwater with colorful fish. A panda swims underwater. The consistency and resolution of the videos are of lower quality, however.
Despite this, the team behind the development of Phenaki maintains that the system “can have a positive impact in a variety of creative contexts”allowing – eventually – users “to accelerate their creativity”with this template which can quickly generate videos.