default-exper
auteurs
Juliette Robin Vernay Partner
Mathilde Ponchel Partner
Expert insight
14 March 2025

AI and copyright: what rights to the data that trains the AI system?

Is data mining an appropriate copyright exception for freely feeding artificial intelligence models?
In a decision dated 27 September 2024, the Hamburg Regional Court has handed down the first ruling on the application of two data mining exceptions.


According to the Hamburg court, the nonprofit organization LAION, which had used a photographer’s work without his consent, could legitimately invoke this data mining exception insofar as its dataset was released free of charge. 


The nonprofit had no commercial objectives, even though its dataset could be reused by commercial firms.
The court thus concluded that there had been no infringement of photographer Robert Kneschke's copyright.
 

Laws on the subject:

  • EU Directive 2019/790 on copyright and related rights provides guidance on this issue: with two data mining exceptions

 

-Articles 3 and 4: Copyright exception : text and data mining (“TDM”) for scientific or commercial purposes

This directive freely authorizes reproductions and extractions of protected works for the purposes of scientific research and, in some cases, for commercial purposes. 


Article 3 §1 - "1. Member States shall provide for an exception to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC and Article 15(1) of this Directive for reproductions and extractions made by research bodies and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access."
 

-An exception to the exception if you’re not in a “TDM for scientific research”  situation! OPT-OUT (adapted into French law in Art.L.122-5-3-III of the French Intellectual Property Code (IPC)).  

 

Article 4 (adapted into French law in Art. L122-5-3-III of the IPC): "The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.

The decree of 23 June 2022 (Art. R. 122-28 of the IPC) specifies that this "opt out" does not have to be reasoned and may be expressed by any means (whilst specifying for online content: "by machine-readable means, including metadata, and by recourse to the general conditions of use of a website or service.").

  • The AI Act (Regulation no. 2024/1689 of June 2024) focuses on the issue of user safety and respect for their fundamental rights and freedoms.

In other words, the issue of intellectual property is addressed only in very general terms. In particular, the IA Act refers to the 2019 EU Directive on copyright and related rights and confirms the application of the TDM exception to AI systems.


The IA Act strengthens the legislation surrounding this exception by requiring providers of general-purpose AI models to comply with EU copyright and related rights law, particularly by implementing a policy for identifying and respecting the opt-out: 
 

Article 53 : Obligations for suppliers of general-purpose AI models

"(c) put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technology, a reservation of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790” (opt out).

Regarding the training data, a sufficiently detailed summary of the content used to train general-purpose AI models must be made publicly available.

However, the AI Act's transparency and copyright obligations seem to apply only to AI model providers, not to the entities responsible for creating the training databases.  

Indeed, such entities have recently been deemed to benefit from the TDM exception, based on a broad construction of the concept of scientific research: 

  • In Germany: Hamburg District Court 27/09/2024 - photographer Robert Knechke, who sued the LAION platform (a German non-profit organization known for releasing training datasets) for having included his photograph in its dataset, which was subsequently used by AI model providers

LAION did not dispute that it had reproduced this photograph, but asserted the benefit of the TDM exception provided for in Articles 3 and 4 of the 2019 Directive.


The courts have sided with LAION, considering that the development of a dataset fell within the scope of the Directive: 


The first exception allows research organizations and cultural heritage institutions to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access, without having to request authorization from the rightholders.

 
The second exception allows such mining to be carried out for any purpose (including commercial), on condition that the rightholder has not reserved his or her rights (and thus exercised what is commonly referred to as an "opt-out"). 


In the case at hand, the pay-for stock photo site explicitly prohibited unauthorized reproduction. Consequently, LAION could not have availed itself of the second TDM exception. 


Nevertheless, the court has ruled that the creation of a dataset constituted a scientific research activity, even if it could subsequently be used by commercial entities. In particular, the court noted that LAION’s collection of data and release of the dataset free of charge, notably for research bodies, could constitute one of the more general stages in the scientific research process, even if the dataset did not, in itself, result in the acquisition of new knowledge. Consequently, LAION could benefit from the first TDM exception for scientific research. 
 

According to this German court decision, an entity such as LAION, which develops datasets that can be used by AI system providers, may thus benefit from the TDM exception, enabling it to reproduce works without the authorization of their owners, and without being required to put in place a policy to comply with copyright reservations and transparency.

  • In the United States, a first court decision paves the way for compensation to rightful claimants

On February 11, a Delaware court ruled in favor of Thomson Reuters, which sued Ross Intelligence for having used its legal content and research platform, known as "Westlaw", to train its AI model.


Ross Intelligence argued that training an AI fell under the "fair use" exception, which permits the limited reuse of copyrighted material.


Nevertheless, the court held that "fair use" could not apply to the use made by Ross Intelligence, particularly in that Ross Intelligence intended to develop a legal competitor to Thomson Reuters' legal platform. The court thus ruled that the copyright had been infringed. 


Although this decision has no impact on the European case-law situation, it does highlight the importance of the issue of reusing copyrighted materials to train AI models.
 

  • We are also awaiting the courts’ ruling in Getty Images. v. Stability AI:

On February 3, 2023, Getty Images Inc. filed a lawsuit in the U.S. District Court of Delaware against Stability AI, LTD, Stability Ai, Inc. and Stability AI US Services Corporation (hereinafter "Stability AI"), accusing them of copying over 12 million photographs from its collection, along with associated captions and metadata, without authorization or financial compensation, in order to develop their generative AI system, Stability Diffusion.


Getty Images is a US image bank, renowned for its database of over 447 million photographs and videos. Users can acquire licenses to use this content, with some photographs available free of charge for non-commercial use.

In this regard, Getty Images' Terms and Conditions of Use expressly prohibit the downloading, copying or transmission of its content without an authorization license, as well as "data mining" or any other method of data collection and extraction. Despite this, Getty Images observed that its content was being used as training data by Stability AI.


According to Getty Images, Stability AI was trained using a dataset compiled by the nonprofit LAION, containing links to billions of contents on the web. Stability AI then followed these links and copied files belonging to Getty Images.


Getty Image alleges copyright infringement, trademark infringement, dilution and tarnishment, and unfair competition
 

  • Copyright infringement    

Getty Image argues that most of the photographs and images on its website are original and protected by copyright, and that these, along with their titles and captions, have allegedly been copied by Stability AI.

  • Trademark infringement, dilution and tarnishment 

Getty Images owns numerous GETTY IMAGES trademarks affixed to its photographs.


However, these watermarks appear on images generated by the Stability Diffusion AI, which thus are not the property of Getty Images. What's more, since the watermarked photos are sometimes of poor quality, or even absurd, Getty Images considers that this use is harmful to its reputation.  
 

  • Unfair competition and deceptive commercial practices  

Getty Images argues that by using its intellectual property rights, Stability AI creates the false impression that Getty Images has given its permission for such use or is in a commercial relationship with Stability AI.


Getty Images therefore seeks the destruction of all versions of Stability Diffusion trained on its content as well as damages for the harm incurred and the profits made by Stability AI as a result of these infringements.


Getty Images has also filed a similar complaint against Stability AI in the UK, with trial scheduled for 2025.


In this respect, the UK High Court of Justice has ruled that one of the claimants, acting as a representative of a class of copyright holders who had entered into exclusive licenses with Getty Images and whose works had been used by Stability AI, lacked standing to sue.


The Court held that it was not possible to clearly identify the persons belonging to this class of owners whose rights had been infringed, since the definition of the class hinged on the outcome of the proceedings, i.e. whether the works had actually been used by the defendant and thus whether their copyright had been infringed. As no definitive list of the works used to train the AI could be drawn up, it was impossible to identify who might be part of the represented class.


Although this decision deals only with procedural issues, it highlights the difficulty involved in clearly identifying the works used by AI models and thus in enabling authors to assert their rights.


Lastly, another case, brought by illustrators and artists against Stability AI and other AI system providers, is currently pending. 
We are closely monitoring these cases, which highlight the difficulty of clearly identifying the content used by AI models. 
 

For the time being, it is difficult to spot a clear trend in the positions adopted by the various European courts. The decision by the Hamburg court marks a first step in this direction, as it adopts a broad interpretation of the TDM exception; but it is not certain that this would be adopted by a French court, given the obligation under French law to interpret exceptions narrowly.

As a general approach, the contours of the TDM exception for scientific research should be analyzed, and as soon as the situation involves TDM for commercial purposes, the authors should be able to exercise their right to say no, provided they have anticipated the situation and set up an OPT'OUT. The next topic in our AI saga.


In France, the issue was debated at the AI Summit, and nearly 35,000 artists have just signed a tribune expressing their concerns about the effects of AI on their professions.


Emmanuel Macron used the term "Far West" in the regional press. "France will continue to have a clear voice, one that protects the specificity of genius, talent, the recognition of rights, of this property", he declared.


Copyright advocates can only hope that creators of works are asked for their authorization, that their right to opt-out is respected, and that they are remunerated accordingly.