Regulation
YouTubers Launch Class Action Against ByteDance Over AI Training Data
Major creators including Ethan Klein are suing ByteDance and Meta for allegedly circumventing YouTube's security to scrape videos for AI model training.

Major creators including Ethan Klein are suing ByteDance and Meta for allegedly circumventing YouTube's security to scrape videos for AI model training.
Ethan Klein's latest YouTube video doesn't feature his usual commentary format. The h3h3Productions creator sits in front of legal documents, explaining how ByteDance allegedly scraped his content to train its Magic Video AI model. "They used sophisticated tools to bypass YouTube's streaming protocols," Klein states in the December 27 video, holding up printed pages from the HD-VILA-100M dataset that he claims contains his work.
The class-action lawsuits, filed December 23 in the Northern District of California, mark a strategic shift in how creators are challenging AI companies. Rather than pursuing traditional copyright infringement claims, the plaintiffs are invoking Section 1201 of the Digital Millennium Copyright Act—the provision that makes it illegal to circumvent technological protection measures. This approach sidesteps thorny questions about fair use and focuses on the alleged methods used to obtain the training data.
According to the complaints reviewed by Bloomberg Law, ByteDance and Meta allegedly used automated tools like yt-dlp to download streaming-only content from YouTube. These tools work by intercepting the segmented video chunks that YouTube sends to browsers during playback, then reassembling them into downloadable files—converting stream-only content into permanent copies.
The distinction matters. YouTube's terms of service prohibit automated downloading, and the platform implements various technical measures to enforce streaming-only access. By allegedly circumventing these measures, the companies may have violated the DMCA even if the underlying content use might otherwise qualify as fair use.
Pascal's Substack legal analysis notes that this DMCA Section 1201 strategy echoes tactics used in early DVD decryption cases. "The plaintiffs aren't arguing about whether AI training constitutes fair use," the analysis explains. "They're arguing that the act of bypassing YouTube's access controls is itself illegal, regardless of what happens to the content afterward."
Ted Entertainment, another plaintiff in the ByteDance suit, alleges its YouTube videos appeared in datasets used to train proprietary video generation models. The company claims ByteDance's scraping operation was industrial in scale, targeting millions of hours of content.
Get the latest model rankings, product launches, and evaluation insights delivered to your inbox.
The timing appears deliberate. These filings bring the total number of AI-related copyright suits to over 70, according to Chat GPT Is Eating the World's tracking. The surge suggests creators are coordinating their legal strategies as video generation models become commercially viable.
ByteDance's Magic Video model, announced earlier this year, can generate 16-second clips from text prompts. Meta's video generation capabilities remain less public, though the company has demonstrated various AI video tools. Neither company responded to requests for comment about the lawsuits by press time.
The HD-VILA-100M dataset sits at the center of the ByteDance allegations. Klein and other plaintiffs claim this dataset, containing 100 million video clips, was built using scraped YouTube content. Similar allegations have been made against Nvidia in separate litigation.
These cases focus on method rather than outcome. Previous AI training lawsuits have struggled with courts' varying interpretations of transformative use. By targeting the scraping process itself, these plaintiffs may have found a more straightforward legal path.
The DMCA's anti-circumvention provisions carry statutory damages of up to $2,500 per violation. With millions of allegedly scraped videos, the potential damages could reach billions—though courts rarely award maximum statutory amounts.
Video creators may want to audit whether their content appears in known AI training datasets. The DMCA Section 1201 strategy could become a template for future creator lawsuits, and companies building video AI models may face pressure to demonstrate clean data provenance. YouTube's technical measures against scraping could also become legally significant precedents, while settlement negotiations might establish industry norms for creator compensation.
The Northern District of California will likely consolidate these cases given their similar claims and defendants. Discovery could reveal the actual scale and methods of any scraping operations—details that remain largely speculative based on public filings. The question is whether AI companies broke the law in how they obtained YouTube content.


