Copyright Protection in the Context of Generative AI

Copyright Protection in the Context of Generative AI

  • The rapid growth of generative AI tools is pressuring traditional copyright law because the act of creation itself is increasingly handed off to software, raising a core question: who is the true owner of an AI-generated work?
  • In early March 2026, the U.S. Supreme Court effectively ended computer scientist Stephen Thaler's multi-year attempt to register an AI-generated image ("A Recent Entrance to Paradise") for copyright, reaffirming that under current U.S. law copyright protection requires a human author.
  • In Andersen v. Stability AI, Judge William Orrick (N.D. Cal.) denied the defendants' motion to dismiss the artists' copyright infringement claims, allowing the core copyright claims to proceed to discovery.
  • The artists advance two key theories:
    • Model Theory: the AI model itself may be an infringing copy because it contains transformed versions of their works.
    • Distribution Theory: distributing the AI model is equivalent to distributing the artists' copyrighted works.
  • The case carries direct implications for software companies on two fronts: how training datasets are assembled and licensed, and who (if anyone) can claim ownership of AI outputs.

Background

The case began when computer scientist Stephen Thaler sought to register a copyright in an image titled "A Recent Entrance to Paradise," which had been created by an AI system. The U.S. Copyright Office and the courts rejected his application, maintaining the long-standing position that copyright protection applies only to human-created works.

The debate, however, extends beyond art and law. It also bears directly on software development and the technology industry. Modern generative AI systems like Midjourney, Stable Diffusion, and various text-based models are built on complex algorithms and massive datasets. They generate new content by analyzing millions or even billions of images and text examples.

This raises a foundational question:

Is this process a creative act in the traditional sense, or is it merely a statistical rearrangement of pre-existing data?

The Parties' Claims and the Court's Approach

The Artists' Position

The artists claim that AI companies downloaded billions of copyrighted images without permission and used them to train their models. In other words, their works were ingested into datasets without consent, and the AI learned from them. On that basis, the artists argue that AI-generated outputs are a direct violation of their copyrights, because the outputs are generated by models trained on those datasets.

A concrete example: AI tools allow prompts such as "in the style of Sarah Andersen". Artists object to this because the AI can replicate their specific artistic style to produce similar works easily, which may undermine their ability to earn a living from their original work.

The deeper technical question is whether AI models effectively store images in a compressed form during training and then draw on them to generate new images. If so, certain AI outputs could be characterized as derivative works under current copyright law.

The AI Companies' Position

AI companies counter that their models do not actually copy images. They argue that:

  • The models learn statistical patterns rather than storing original images.
  • The outputs are new and original creations.
  • The process is analogous to human learning much like an artist studying the work of others before creating new pieces.

In their view, AI merely analyzes a large corpus of images and then creates new ones in a similar manner.

The Court's Ruling on the Motion to Dismiss

The Northern District of California dismissed certain claims (including unjust enrichment and breach of contract), but on August 12, the artists secured an important procedural victory:

  • Judge William Orrick denied Stability AI's and Midjourney's motion to dismiss.
  • The copyright infringement claims were allowed to proceed.
  • The case moved into the discovery phase, where evidence is gathered.

The judge found the artists' theory plausible that Stability AI made it easier to copy copyrighted content by sharing its Stable Diffusion model with other companies.

Key Evidence Cited by the Court

  • A public statement by Stability AI's CEO that the company had compressed approximately 100,000 GB of images into a 2 GB model capable of recreating those images.
  • Academic studies showing that training images can, in some cases, be reproduced as AI outputs in response to specific prompts.

The Artists' Two Core Legal Theories

#

Theory

Core Argument

1

Model Theory

The AI model itself is a copy because it contains transformed versions of the artists' works.

2

Distribution Theory

Distributing the AI model is equivalent to distributing the artists' copyrighted works.

The court considered both theories plausible and permitted them to proceed. Whether they ultimately hold up will depend on whether the artists can show, factually and technically, that their works are in fact present inside the AI systems — a question that will be tested during discovery.

Impact on Software and AI Development

1. The Use of Training Data

AI models are typically trained on datasets that include copyrighted content. As cases like Andersen v. Stability AI illustrate, this practice gives rise to copyright infringement claims. Several open questions remain:

  • Under what conditions can training data qualify as fair use?
  • Does training a model on copyrighted images constitute "use" of those works, or merely "learning" from them?
  • If courts conclude that unlicensed use of such data is a violation, software companies may need to fundamentally rework how they collect data and train their models.

2. Ownership of AI Outputs

When a user prompts an AI system and receives an image or text, who owns the result?

  • Current statutes and case law do not provide a clear answer.
  • The implicit signal from the Supreme Court is that, under existing law, an author must be human, not a machine.
  • The likely consequence: works generated without sufficient human contribution may not be protected at all.
  • Software developers may therefore need to design systems that meaningfully incorporate human input, or restructure their datasets to reduce legal exposure.

3. The Innovation Debate

The decision's effect on innovation remains contested:

  • AI proponents argue that restrictions will slow technological progress and limit tools that democratize creativity. Generative AI has, in fact, made artistic creation more accessible and lowered the technical skill barrier.
  • Critics counter that this trend devalues human labor and normalizes the unlicensed use of copyrighted content.

Conclusion

At the heart of this case lies the meaning of "author" under U.S. copyright law. Although the statute does not define the term with precision, the weight of regulation and precedent assumes that an author must be a human.

In Andersen, the court did not offer an extensive theoretical explanation, but its ruling makes clear that the legal system continues to follow a human-centered approach. The case, which moves into discovery in September 2026, has the potential to become a landmark precedent capable of reshaping this entire framework.

More broadly, the dispute exposes a mismatch between generative AI technology and existing copyright law:

  • Current AI technology is meaningfully challenging copyright law and will likely continue to do so.
  • Existing statutes were not drafted with AI in mind, and current rules cannot fully resolve the new issues these systems raise.
  • New legal interpretations — or new legislation purpose-built for AI — may be required.

Key Terms

  • Generative AI: A class of AI systems that produce new content (images, text, audio, code) by learning patterns from large datasets of existing works.
  • Training data: The corpus of works (often including copyrighted material) used to train a generative model.
  • Derivative work: Under U.S. copyright law, a work based upon one or more pre-existing works; the copyright holder of the original generally controls the right to prepare derivatives.
  • Fair use: A statutory doctrine (17 U.S.C. § 107) allowing limited unlicensed use of copyrighted material, evaluated through a four-factor test.
  • Model Theory: The argument that an AI model is itself an infringing copy because it embeds transformed versions of training works.
  • Distribution Theory: The argument that distributing such an AI model is equivalent to distributing the underlying copyrighted works.
  • Human authorship requirement: The principle, currently followed by U.S. courts and the Copyright Office, that copyright protection requires a human author.

FAQ

Q: What is Andersen v. Stability AI about? A: A group of artists, including Sarah Andersen, sued AI companies (including Stability AI and Midjourney) alleging that the companies copied billions of copyrighted images without permission to train their generative models, and that the resulting models and outputs infringe the artists' copyrights.

Q: What did the court decide in August? A: Judge William Orrick denied the defendants' motion to dismiss the core copyright infringement claims, allowing those claims to proceed to discovery. Some other claims (e.g., unjust enrichment, breach of contract) were dismissed.

Q: Can an AI-generated work be copyrighted in the U.S.? A: Under current law, no — at least not without sufficient human authorship. The Supreme Court's March 2026 action in the Thaler matter reinforced the position that copyright protection requires a human author.

Q: What are the "Model Theory" and the "Distribution Theory"? A: They are two legal theories advanced by the artists. The Model Theory holds that the AI model itself is an infringing copy of the training works. The Distribution Theory holds that distributing the model is tantamount to distributing those underlying copyrighted works.

Q: What does this mean for software companies and AI developers? A: Two main consequences. First, they may need to rethink how training datasets are sourced and licensed if courts find unlicensed training to be infringing. Second, they may need to design systems that incorporate meaningful human authorship if they want the outputs to be eligible for copyright protection.

Q: What's next? A: The case enters its discovery phase in September 2026, where the parties will gather and exchange evidence including, critically, technical evidence about whether and how the training works are present within the AI models.

References

  1. Andersen v. Stability AI Ltd., No. 23-cv-00201-WHO (N.D. Cal.). Casetext link
  2. Zoe Schor, Andersen v. Stability AI: The Landmark Case Unpacking the Copyright Risks of AI Image Generators, N.Y.U. J. Intell. Prop. & Ent. L. (Dec. 2, 2024). Link
  3. Victor Tangermann, Supreme Court Blow to AI Artists Copyright, Futurism (Mar. 8, 2026). Link