Anthropic settles with authors in first-of-its-kind AI copyright infringement lawsuit

If the court approves the settlement, Anthropic will compensate authors around $3,000 for each of the estimated 500,000 books covered by the settlement.

The settlement, which U.S. Senior District Judge William Alsup in San Francisco will consider approving next week, is in a case that involved the first substantive decision on how fair use applies to generative AI systems. It also suggests an inflection point in the ongoing legal fights between the creative industries and the AI companies accused of illegally using artistic works to train the large language models that underpin their widely-used AI systems.

“This settlement marks the beginning of a necessary evolution toward a legitimate, market-based licensing scheme for training data,” said Cecilia Ziniti, a tech industry lawyer and former Ninth Circuit clerk who is not involved in this specific case but has been following it closely. “It’s not the end of AI, but the start of a more mature, sustainable ecosystem where creators are compensated, much like how the music industry adapted to digital distribution.”

Authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson filed their complaint against Anthropic for copyright infringement in 2024. The class action lawsuit alleged Anthropic AI used the contents of millions of digitized copyrighted books to train the large language models behind their chatbot, Claude, including at least two works by each plaintiff. The company also bought some hard copy books and scanned them before ingesting them into its model. The company has admitted to doing as much, a fact that the plaintiffs raise their complaint. “Anthropic has admitted to using The Pile to train Claude,” the complaint states. (The Pile is a big, open-source dataset created for large language model training.)

“Rather than obtaining permission and paying a fair price for the creations it exploits, Anthropic pirated them,” the authors’ complaint states.

In his June ruling, Judge Alsup agreed with Anthropic’s argument, stating the company’s use of books by the plaintiffs to train their AI model was acceptable.

“The training use was a fair use,” he wrote. “The use of the books at issue to train Claude and its precursors was exceedingly transformative.”

However, the judge ruled that Anthropic’s use of millions of pirated books to build its models – books that websites such as Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi) copied without getting the authors’ consent or giving them compensation – was not. He ordered this part of the case to go to trial. “We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for willfulness),” the judge wrote in the conclusion to his ruling. Last week, the parties announced they had reached a settlement.

In statements to NPR, both sides appear satisfied with the outcome of the case.

“Today’s settlement, if approved, will resolve the plaintiffs’ remaining legacy claims,” said Anthropic Deputy General Counsel Aparna Sridhar. “We remain committed to developing safe AI systems that help people and organizations extend their capabilities, advance scientific discovery, and solve complex problems.”

The settlement also met with approval from the creative community.

“This historic settlement is a vital step in acknowledging that AI companies cannot simply steal authors’ creative work to build their AI just because they need books to develop quality large language models,” said Authors Guild CEO Mary Rasenberger. “We expect that the settlement will lead to more licensing that gives authors both compensation and control over the use of their work by AI companies, as should be the case in a functioning free market society.”

Anthropic is in a good position to handle the sizable compensation. On Tuesday, the company announced the completion of a new funding round worth $13 billion, bringing its total value to $183 billion.

And in the latest in a string of legal actions involving major entertainment corporations, on Friday, Warner Bros. Discovery filed a lawsuit in California federal court against AI image generator Midjourney for copyright infringement. NPR has reached out to Midjourney for comment.