Raw Story v. OpenAI: A Landmark Decision Shaping AI Copyright Law

In a significant ruling that could help define the boundaries of AI training and copyright law, Judge Colleen McMahon of the Southern District of New York has dismissed Raw Story Media and AlterNet Media's copyright infringement lawsuit against OpenAI. This November 2024 decision provides crucial insights into how courts may approach the intersection of AI development and copyright protection.

The Case at a Glance

Raw Story and AlterNet, two independent news organizations, filed suit against OpenAI in February 2024, alleging that the company had unlawfully used their copyrighted news articles to train its large language models (LLMs). The case joined a growing roster of lawsuits challenging AI companies' training practices, including similar actions by authors and news organizations.

Key Arguments and the Court's Analysis

1. Direct Copyright Infringement

The plaintiffs argued that OpenAI directly infringed their copyrights by copying their articles into training datasets. However, the court found this claim inadequately pleaded because the plaintiffs failed to:
- Identify specific copyrighted works that were allegedly infringed
- Register these works with the Copyright Office before filing suit
- Demonstrate actual copying by OpenAI

2. The Output Question

A fascinating aspect of the court's analysis concerned whether ChatGPT's outputs could constitute copyright infringement. The plaintiffs claimed that the AI could reproduce their articles' content, but Judge McMahon found this argument speculative and unsupported by concrete examples.

3. Vicarious and Contributory Infringement

The court also dismissed secondary infringement claims, noting that the plaintiffs failed to identify any direct infringement by third parties that OpenAI might have facilitated or encouraged.

Broader Implications

This ruling has several significant implications for the AI industry and content creators:

Pleading Standards: The decision establishes a high bar for plaintiffs in AI copyright cases. Simply alleging that content was likely used in training isn't enough – specific works and registrations must be identified.
Training Data Analysis: The court's analysis suggests that the mere inclusion of copyrighted material in training datasets may not automatically constitute infringement, though this question wasn't definitively resolved.
Output-Based Claims: The ruling indicates skepticism toward claims based on theoretical ability to reproduce content without concrete examples of such reproduction.

Overall, the court's emphasis on specific, concrete evidence of infringement suggests that future cases may need to focus more on demonstrable harm rather than theoretical capabilities.

Why Raw Story Would Likely Succeed in the EU: A Legal Analysis

In the European Union, Raw Story's case against OpenAI would likely have a substantially different outcome, though "winning" might look quite different from what we typically think of as a court victory in the US system. Here's why:

Automatic Rights and Standing

First, Raw Story would clear the initial hurdles that proved fatal in the US case. Under EU law, particularly Article 15 of the DSM Directive, press publishers automatically have rights over their content for two years after publication. There's no registration requirement, and the mere fact of publication establishes their standing to bring a claim.

Burden Reversal

The crucial difference lies in the burden of proof. In the EU, once Raw Story established they were news publishers whose content was potentially used in training, the burden would effectively shift to OpenAI to demonstrate either:

They didn't use the content
They had proper licensing arrangements
They complied with opt-out mechanisms
They had implemented required technical measures

Presumption of Protection

Unlike in the US case, where Raw Story needed to prove specific instances of copying, EU law would presume that systematic web scraping for AI training likely included news content unless proven otherwise. This presumption alone would probably force OpenAI into a settlement or licensing agreement.

Different Definition of "Victory"

However, the outcome wouldn't necessarily be a traditional "win" in the sense of damages for past infringement. Instead, the likely result would be:

Mandatory Licensing Agreement
Structured compensation framework
Ongoing payment mechanisms
Usage tracking requirements
Regular reporting obligations

Why This Matters

The EU approach effectively transforms what would be a copyright infringement case in the US into something more akin to a regulatory compliance matter. This reflects the EU's broader philosophy that AI development should occur within structured regulatory frameworks that protect various stakeholders' rights from the outset, rather than addressing conflicts through litigation after the fact.

Photo by cerridan