Since late 2025, there have been well over 200 papers cataloged in a GitHub repository called awesome-video-diffusions (auto updating every day) which document how quickly people are creating alternatives to Sora from OpenAI. The project tracks several areas including controllable video generation methods, text-to-video synthesis and safety evaluation frameworks. Contributions to the project surged after OpenAI’s demonstration of Sora in November of 2025 but prior to when Sora was made available to the public.
Many of the contributors to the project appear to be focused on building upon the architecture of Open-Sora, which is the largest single contributor to the project thus far. As illustrated in the repository, many researchers are working to create controllable video generation where users will be able to specify camera movement, object placement, and scene transition. Since January 2026, there have been forty seven papers published that address this area of challenge.
As illustrated by the repository, what appears to happen with a major lab's video model announcement is a wave of open source implementation attempts follow. These attempts seek to either replicate or exceed the capabilities announced. There were twelve papers related to evaluating the safety of video models published in Q4 2025; thirty-one were published in Q1 2026. It appears that researchers are taking proactive steps toward identifying potential misuses before their deployments.
This cycle is significantly faster than previous AI research cycles. Large text-based models had viable open-source alternatives for years following their initial proprietary releases. Video diffusion models, however, have already seen viable alternatives within months of their proprietary announcements.
Get the latest model rankings, product launches, and evaluation insights delivered to your inbox.
Categories of papers illustrated in the repository expose ongoing gaps. While frame coherence for longer sequences continues to be inconsistent across many of the open-source models; most open-source models currently limit themselves to clips less than ten seconds long. Individual researchers are excluded from utilizing current models due to computational requirements. A majority of the median papers report that they utilize clusters of eight or more high-end GPUs during training.
Papers related to safety evaluations also illustrate another gap. Advances have been made in detecting generated content, yet the detection methods currently struggle to identify video that blends both real and synthetic elements. One subset of papers concentrates specifically on temporal artifacts, which are slight discrepancies in motion that distinguish AI-generated video from captured footage.

Maintainers of the repository point out that contribution velocity has doubled each quarter — from thirty-four papers in Q3 2025 to approximately one hundred and forty papers expected by the conclusion of Q2 2026. This indicates that enough researchers with sufficient expertise and computing resources are now part of the field.
There are open-source video generation models available for various domains such as talking heads, nature scenes and abstract animation. Training costs for basic models have decreased from $100K down to approximately $15K over six months. Researchers are actively developing watermarking techniques designed specifically for video diffusion output. Inference speed remains an issue limiting commercial applications — most models require two-five minutes to generate five seconds of video. Evaluation metrics used among researchers in the field enable direct comparison of models.
