Cost curve does have a floor
OpenAI’s Sora was shut down due to an estimated $15 million daily ($5.4 billion annually) spend on video production to support enterprise products and coding tools before an IPO.
Now what?
Unlike LLMs (large language models), video via diffusion models has a steeper path to follow than those powering gpt-5, Gemini, or Claude. However, barring world war iii, the cost curve of video is expected to still collapse to $0.01 per minute across all categories of video. There are many pathways leading to the top of this mountain; however, the pathway may be more difficult than LLMs.
Economic diffusion of video:
Sora-style ai systems work by beginning with static “noise” and progressively cleaning it up step-by-step until a clear and coherent image forms. Other parts of these systems determine how to understand the world and therefore interpret user input as well as ensuring that the video does not degrade over time from the start of the video through to the end.
Both a lot of data and computer resources are needed to accomplish this. Resources are needed both for training the model as well as running it for the end-users. This capital must be reallocated from other models competing for the same compute resources. In today’s era of low-cost capital, low-cost energy and abundant compute labs have the ability to make riskier bets with greater promise of capturing the entire media pipeline promising great wealth for the labs and hyperscalers.
However, in a constrained environment LLMs offer a higher return-on-investment (roi) per unit of compute compared to video. This is true for two reasons. First, there is far more money spent on labor in organizations and enterprises than there is on video and media. Second, the labor dollar has the virtue of being self-referential and creating a recursive, positive feedback loop for the models themselves.
Get the latest model rankings, product launches, and evaluation insights delivered to your inbox.
Management teams within tech companies are already talking about “the token budget” as equivalent to “the labor budget.” additionally, regardless of whether we consider ai replacement debate or not, the u.s. Service-based economy spends approximately $12 trillion per year on labor compared to $2 t per year on media.
Video models with Chinese characteristics

Chinese labs experience a very different competitive calculus. ByteDance and Kuaishou own distribution platforms that make video diffusion businesses viable. Douyin and Kuaishou combined serve hundreds of millions of short-form videos that function as training data, hundreds of millions of creators using generated content commercially, and advertising ecosystems that monetize views. For example when a creator generates an ecommerce ad with a product link attached, Douyin/Kuaishou share revenue based on views and clicks. The video model supports the content ecosystem which supports the ad business which funds the compute costs.
The cost structures are also quite dramatically different. According to reports, Chinese video models reportedly operate at 1/6 to 1/10 the inference cost of Sora for identical outputs. Furthermore, government computing centers provide direct subsidies for research with one such center located in Changchun providing 200 out of 300 petaflops of compute for a single video ai project. The results speak for themselves: Kling achieved $240m in annualized revenue by late 2025. Several Chinese video ai startups are achieving break-even points on subscription revenues alone – something no western video ai company has accomplished.
Perhaps most importantly, the opportunity cost argument does not apply in the same manner. OpenAI had to decide between GPU hours for video vs GPU hours for enterprise code generation.
Where do we go from here?
The status quo is fluid. Advances in compute efficiency, breakthroughs in technology, etc., may ultimately make diffusion economically viable and perhaps agile new entrants successful. Runway recently received a gigantic funding round and continues to release new models but remains behind the pack.
My prediction:
- next-generation models developed by google and xAI may accelerate progress toward pushing the boundaries of what is possible and with their significant war chests and access to their own distribution platforms and compute may potentially create viable vertically integrated platforms similar to their Chinese counterparts. Potential headwinds include rising energy costs, limited compute availability or xAI’s current restructuring activities.
- Chinese models are likely to continue to gain distance relative to their western counterparts with restricted access to western markets contingent upon resolving questions regarding intellectual property and copyright.
- as the open-source community develops and gets folded into existing pipelines, as they advance the economics change completely should they be able to run on local machines.
