Meta’s Latest AI Model Raises Copyright Concerns
Meta’s newest AI model, Llama 3.1, released in July 2024, has been found to replicate passages from well-known books, including ‘Harry Potter,’ more frequently than anticipated. Researchers discovered that the AI has memorized roughly 42% of the first ‘Harry Potter’ book and can accurately reproduce 50-word sections about half the time.

The study, conducted by experts from Stanford, Cornell, and West Virginia University, examined how five leading AI models processed the Books3 dataset, which includes thousands of copyrighted titles. The findings suggest that Llama 3.1 retains large portions of copyrighted content, significantly more than its predecessor, Llama 1.
Why Meta’s Models Are Reproducing Exact Text
Researchers suggest several reasons for this behavior. One possibility is that the same books were repeatedly used during training, reinforcing memorization rather than generalizing language patterns. The training data may have included excerpts from fan websites, reviews, or academic papers, leading the model to inadvertently retain copyrighted content. Adjustments to the training process may have amplified this issue without developers realizing the extent of its impact.
Implications for Meta and the Tech Industry
These findings intensify concerns about how AI models are trained and whether they might be violating copyright laws. As authors and publishers push back against unauthorized use of their work, this could become a major challenge for tech companies like Meta. The New York Times has already sued OpenAI and Microsoft for copyright infringement, alleging that their AI models were trained on copyrighted articles without permission.
The issue highlights the need for tech companies to address copyright concerns in AI training data. As AI continues to evolve, finding a balance between AI development and copyright protection will be crucial.