2023 was the year of generative AI, but more specifically, the year we witnessed the power and potential of LLMs, large language models. A lot of the world of work is based around text: documents, email, content, media. Both startups and large tech companies leaned in hard, incorporating automation tools and generative AI applications across verticals.
Visual generative AI made strides as well. Midjourney V6, which was launched in December 2023, and and OpenAI’s Dalle-3 both provided a step jump in image creation.
But the next frontier is video. Progress in generative AI technologies for video has also be moving very fast, but it’s generally less talked about than text and images, which already have products with wide consumer adoption.
Generative AI in video consists of several buckets:
- Automatic video editing (includes descript
- Talking avatars – text to video (includes companies like HourOne, Synthesia, HeyGen)
- Video footage generation (i.e. moving pictures) from prompt
This post focuses on video footage generation.
Timeline of Generative AI for video progress in 2023
A16Z partner Justine Moore posted an excellent X thread on the advances of generative AI for video right before the end of the year.
As Justine’s timeline shows, the big players in this space are the large tech platforms: Google, Meta, Nvidia in the US and in China, Bytedance, Alibaba and Baidu. While Google and Meta shared they are working on AI Video generation, they’ve yet to release their products to the public.
The large tech players are well positioned to lead in this space given their access to deep learning talent, unlimited cloud resources and deep pockets. Google Brain recently open-sourced Phenaki, a video diffusion model that points towards YouTube’s internal capabilities. It is capable of generating a two minute AI generated video, using a series of prompts. Meta’s Make-A-Video builds on the recent progress made in text-to-image generation technology built to enable text-to-video generation. Many other paper in this space were published in 2023.
On the startup front, up and coming players like PikaAI and RunwayML, offer very short, but high quality video creation tools. And then, there are open source solutions like Stability.ai’s Stable Video Diffusion launched in November 2023.
RunwayML is targeting Holywood and AI filmmaking
Another tool worth calling out, generating videos from Images is FinalFrame. Here’s my video for “Panda bear surfing in Hawaii”
AI that makes everybody dance, using a pictur
Justine Moore tracked 21 products publicly available that enable users to generate AI video footage (you can check them out in this Google doc created by Justine). Note that the majority of tools generate very short videos (up to 16 seconds).
With sufficient data and compute, photorealistic, interactive video generation seems within reach. As an investor in generative AI/ interactive entertainment, this is an incredibly exciting time for the Generative AI video field as these models begin crossing the threshold of usefulness. However, significant challenges remain around bias, misinformation, and intellectual property, in addition to the yet unknown impact of incoming regulation. Also, investors have a tough question to ask: is generative AI a real platform shift, or are we in a bubble?
Related