The New York Times Co. filed a lawsuit against Microsoft Corp. and multiple entities associated with OpenAI, alleging that they used its intellectual property as training data without authorization, resulting in copyright infringement and unfair competition.
The lawsuit, filed in the U.S. District Court for the Southern District of New York, alleges that Microsoft and OpenAI used copyrighted material from the New York Times to train their AI models, including developing generative AI (GenAI) tools such as Bing Chat and ChatGPT that were trained using millions of Times articles and other works. The lawsuit claims that these AI tools are capable of generating content that "recites Times content verbatim, summarizes it carefully, and mimics its expression style."
Emphasis on democratic rights
In its lawsuit, the Times stressed that independent journalism is “vital to our democracy” and claimed that the service it can provide by investing in providing “deeply reported, professionally independent journalism” is an “increasingly rare and valuable” service achieved through “the efforts of a large and expensive organization.”
The lawsuit alleges multiple charges against the defendants, including copyright infringement, incidental and contributory copyright infringement, and violations of the Digital Millennium Copyright Act. The New York Times claims that the defendants’ actions constitute “building a substitute product based on the Times’ substantial investment in journalism without permission or payment.”
In the lawsuit, the New York Times seeks statutory damages, compensatory damages, restitution, a permanent injunction to prevent further infringement, and the destruction of all AI models and training sets containing its work.
The development of this case may become an important moment in determining the relationship between generative artificial intelligence and copyright law. Intellectual property and artificial intelligence lawyer Cecilia Ziniti said on social media that this is a "historic" case and may be "the best case to date that generative artificial intelligence constitutes copyright infringement."
In her analysis, Ziniti highlighted the key issues of “access and substantial similarity” in the case, noting that ChatGPT’s output closely resembles content from the New York Times, which makes up a large portion of the Common Crawl dataset it was trained on. She also highlighted J Exhibit in the lawsuit, using color coding to demonstrate the substantial overlap between the two.
In her analysis, Ziniti also noted that while OpenAI has established content agreements with other media outlets, such as Politico, it does not have one in place with the New York Times. She believes this apparent oversight could lead to legal challenges because it could suggest that OpenAI is willfully ignoring certain intellectual property rights. #人工智能模型 #联邦起诉


