Updates to our Terms of Use

We are updating our Terms of Use. Please carefully review the updated Terms before proceeding to our website.

Wednesday, May 15, 2024 | Back issues
Courthouse News Service Courthouse News Service

Novelists claim tech company Nvidia used pirated work to train AI model

The authors claim Nvidia used their work to train AI models to mimic human written communication.

SAN FRANCISCO (CN) — Technology giant Nvidia is facing a class action from a group of authors who say the company used pirated versions of their copyrighted work to train its artificial intelligence platform.

Nvidia is best known for its GeForce GTX and RTX line of desktop and laptop graphics cards, but over the last several years the company has become a dominant presence in artificial intelligence across both software and hardware markets.

The plaintiffs, novelists Brian Keene, Abdi Nazemian and Stewart O'Nan, sued Nvidia Friday night in San Francisco federal court claiming their works were included in a dataset called "The Pile" that contained a collection of books called "Books3.”

The authors say the Books3 dataset copied all of Bibliotok, a shadow library of nearly 200,000 pirated books that was used to train Nvidia’s NeMo Megatron AI models to mimic human written communication.

“Nvidia has admitted training its NeMo Megatron models on a copy of The Pile dataset. Therefore, Nvidia necessarily also trained its NeMo Megatron models on a copy of Books3, because Books3 is part of The Pile,” the plaintiffs said in their complaint.

“Certain books written by plaintiffs are part of Books3 — including the infringed works — and thus Nvidia necessarily trained its NeMo Megatron models on one or more copies of the infringed works, thereby directly infringing the copyrights of the plaintiffs.”

The plaintiffs are asking for unspecified damages for people in the United States whose copyrighted works helped to train NeMo in the last three years and for Nvidia to destroy all copies of the Books3 dataset used to power the NeMo Megatron models. 

The NeMo Megatron models were hosted on a website called Hugging Face that included a description of AI models' training dataset, which stated that NeMo was trained on The Pile. The Pile's Books3 dataset was listed on Hugging Face until October 2023, when the dataset was removed with a message that it "is defunct and no longer accessible due to reported copyright infringement."

The plaintiffs say the court still must intervene, however, because Nvidia “has continued to make copies of the infringed works for training other models.”

The authors also say Nvidia intends to distribute its NeMo models "as a base from which to build further models,” further infringing on their work.

The action was filed by Joseph Saveri Law Firm, the same firm who are representing comedian Sarah Silverman and other authors who are suing tech giant OpenAI for similar reasons. Most claims were dismissed in that case, but claims for direct copyright infringement were allowed to proceed by U.S. District Judge Araceli Martinez-Olguin. The plaintiffs in that case also claim that their works were copied from shadow libraries like Bibliotok.

Nvidia is the latest tech company to face claims of infringement over its AI language models; in addition to OpenAI, Microsoft was sued by the New York Times in December, and Meta and Bloomberg also face claims over their use of language models.

Requests for comment from Nvidia were not returned before deadline.

Categories / Technology

Subscribe to Closing Arguments

Sign up for new weekly newsletter Closing Arguments to get the latest about ongoing trials, major litigation and hot cases and rulings in courthouses around the U.S. and the world.

Loading...