How Will Generative AI Disrupt Video Platforms?
Researchers say AI that creates content threatens to disrupt the big players (Netflix, TikTok, and Youtube) in video streaming because it changes the power and economics.
By Atin Gupta and Geoffrey G. Parker, courtesy of Harvard Business Review
Generative AI is an artificial intelligence model that, when trained on massive datasets, can generate text, images, audio, and video by predicting the next word or pixel. The simplest input (called a prompt) to generative AI is a text description. Based on that text description, a generative pre-trained transformer (GPT) can write a paragraph, a text-to-image model such as Stable Diffusion can create a picture, MusicLM can create music, and Imagen Video can create a video. This technology will democratize all kinds of content creation.
For video creation it could level the playing field more than smartphones and social video platforms have already done. It will also fundamentally change the video content industry.
Consider Netflix, TikTok, and YouTube — the stars in this domain. Although each is unique in terms of content type and business model, all three platforms operate by incentivizing creators to develop engaging content, matching the right content to the right consumer, identifying what content drives engagement. Each of these elements builds on each other to create a flywheel that has helped all three platforms gain viewers at high speed. But that flywheel is beginning to lose momentum. Generative AI will make their problems worse by creating a new video content creation value chain.
Why Netflix, Tiktok, and YouTube are in trouble
Netflix, TikTok, and YouTube have done well due to their ability to determine content relevance and engagement. They all have enormous amounts of data about who watches what and how. Despite their success, determining the “what” still presents two serious challenges:
Extracting useful, precise features. If a video is commissioned (as happens at Netflix), the categories it falls into are known: genre, cast, duration, etc. But those are broad and sometimes subjective labels, which makes it difficult for an algorithm to learn from them. Of course, many of the video’s features can be specified; the script, shot list, and other production features are known precisely. But attempts to use this data, however, lead to the other extreme: there can be too much information to describe just one video.
Overcoming barriers to creation. Closed, Hollywood-style content production is expensive and slow. Netflix spent $17 billion on content in 2022. Netflix co-CEO Greg Peters said, “But if we deliver a Wednesday every week, if we deliver a Glass Onion every week, we’ll get the vast majority of those viewers back.” Clearly, they can’t yet deliver a Wednesday (a popular and high-budget modern spin on The Addams Family) every week with their current production model.
The alternate model is the open, user-generated content creation used by TikTok and YouTube. Although that is relatively cheap and fast, it requires setting incentives that balance three (sometimes conflicting) objectives: 1) retaining the influential creators, 2) motivating new creators, and 3) retaining and growing the viewer base. As platforms in this space try to generate a sufficient volume of engaging content from a relatively small number of popular creators, they are triggering incentive wars. For example, TikTok allegedly engages in “ heating” to manually promote videos. YouTube Shorts, meanwhile, has lowered the bar for creators to earn revenue — they only need 1,000 subscribers instead of TikTok’s requirement of at least 100,000 followers.
These two challenges partly explain the failure of the short-lived streaming platform Quibi.
Quibi combined the weaknesses of all three into a single platform. It doubled down on the closed, Hollywood-style production system by hiring expensive creators and actors. Instead of empowering individual creators as YouTube and TikTok have done, Quibi placed its bets on brand name creators and actors. In return, it got poor (likely second-tier) content that just didn’t work. That is because it targeted Millennials and Gen Z but without fostering creators in those age groups. Also, surprisingly, it didn’t use AI to determine what content to produce (although it used AI to recommend viewers what to watch).
No human-driven platform has yet overcome both of these challenges. However, a solution may exist. Generative AI will change what video content to produce, how to produce it, and whom to show it to, ushering in an altogether new kind of AI-enabled platform.
Toward the generative platform
Imagine this scenario. A creator enters this text description: Two people are sitting in an Art Deco café. It’s snowing outside. One of them bites into a wedge of Swiss cheese and remarks, “I’m creatively constipated!”
A hyper-realistic, live-action video (with sound) is almost instantly generated and shown to billions of viewers. Not only do we know who watched for how long, who skipped what parts, the likes, shares, comments, searches and all the off-platform discussions about the video but we also know the exact input used to create that video. In one shot, this scenario overcomes the two challenges with existing video platforms. It provides a much more precise description of the video (the input text prompt), and it greatly lowers the barriers to creation (it’s as simple as typing out your imagination). No need to bother with CapCut, or even with actors.
This sounds like magic — and indeed, it doesn’t exist yet — but it would be just an ensemble of three AI programs. AI #1 generates the video based on the text input. AI #2 matches the video with the right viewers. AI #3 uses the resulting engagement to guide creators on what to make next. A more primitive version of this production model is already producing content, perhaps most notably the Seinfeld parody sitcom “ Nothing, Forever,” which uses generative AI to create the script and has almost 100,000 followers.
The generative AI driven video platform reduces barriers to value creation by guiding creators on what drives engagement and showing relevant content to viewers.
At the same time, the reduced barriers and improved guidance in turn enable the creators to increase the value they can create outside the firm. And because of the near-zero friction on both sides between creating and watching relevant content, creators are also viewers and vice-versa. The boundary is further blurred if the viewer types in a search, and that input text becomes the prompt for a new video.
The economic impact will be huge. Traditionally a small percentage of very popular content on a platform has made up for a large percentage of less popular content. A generative AI platform will supercharge the success of the popular content because creators will be supercharged with the help of algorithmic recommendations on what to make next. At the same time the much lower barriers to creation will improve the profitability of the remainder.
How will the leading incumbent platforms adapt? Out of the three, Netflix, is most locked into its business model and will likely find it hard to dramatically change. It long resisted an ad-supported model and has only recently moved in that direction.
TikTok is the closest to a Generative AI Platform in terms of business model, capabilities, and flexibility to what we see coming, but it has come under regulatory scrutiny in the United States. YouTube is in a favorable position as it has been trying hard to compete by introducing Shorts and improving creator incentives. It also has the backing of Google’s AI capabilities. However, Google has already shown that it is slow to move commercially in the generative AI space.
The recent acceleration of technical progress in and awareness of generative AI has been nothing short of staggering. To be sure, we don’t yet have the technology to generate hyper-realistic, live-action video from a text input, and the availability of such technology is key to realizing the new platform.
Even when it becomes available, the text inputs may often be unable to provide a precise-enough definition of the video and we are likely to see the platform generate a multiplicity of similar, but not identical, videos emerging as creator-viewers compose similar texts. And as the platform learns the keys to generate engaging content, how will the conflict of interest between the platform and creators be managed? How will the platform prevent unlicensed deep fakes and the inevitable propaganda and false information that will come?
Despite these caveats, it is highly likely that generative AI will power new video content platforms that supersede or at least supplement the current incarnations of Netflix, YouTube, and TikTok.
Generative AI technology will not only be used to create content but also to power the platform dynamics among the platform, the creators, and the consumers. It almost goes without saying that none of this comes without technological uncertainty and ethical risks. And of course, video is just one realm where we expect to see such rapid change. Many other creative domains in art, music, and the written word are in for dramatic change and new business opportunities for those who can see what is ripe for disruption — or those who would harness generative AI to protect their turf.