Close Menu
Breaking News in Technology & Business – Tech Geekwire

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    No title available in the provided content

    July 4, 2025

    No title available in the original content

    July 4, 2025

    Amazon.com, Inc. Stock Analysis and Recent News

    July 4, 2025
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech GeekwireBreaking News in Technology & Business – Tech Geekwire
    • New
      • Amazon
      • Digital Health Technology
      • Microsoft
      • Startup
    • AI
    • Corporation
    • Crypto
    • Event
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech Geekwire
    Home ยป Meta’s Secret Experiments Revealed: How AI Models Are Trained Using Pirated Books
    AI

    Meta’s Secret Experiments Revealed: How AI Models Are Trained Using Pirated Books

    techgeekwireBy techgeekwireApril 24, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email

    Meta’s Secret Experiments with Pirated Books Revealed

    A recent legal case involving Meta has uncovered the company’s secret experiments with training data for its Llama AI models. The tech giant used a process called ‘ablation’ to determine how specific data improved its AI performance. This revelation has sparked debate about assigning value to AI training data and potential compensation for content creators.

    What is Ablation in AI?

    Ablation is a technique borrowed from medical research where parts of a system are removed or altered to study their impact on performance. In AI, it involves modifying training data to see how changes affect model outcomes. Meta researchers used this method to test how different types of data – including pirated books from the LibGen database – affected their Llama models.

    Mark Zuckerberg at the Breakthrough Prize Ceremony
    Mark Zuckerberg at the Breakthrough Prize Ceremony

    In the experiments, Meta added various categories of books to their training data. One test included science, technology, and fiction books, while another used only fiction books. Both experiments showed notable improvements in Llama’s performance on industry benchmark evaluations. For instance, adding science, technology, and fiction books improved performance by 4.5% on the BooIQ benchmark, while adding only fiction books resulted in a 6% improvement.

    Implications of the Findings

    The results suggest that Meta can assign value to specific training data, which could support a system for compensating content creators. According to Nick Vincent, assistant professor at Simon Fraser University, ‘Stating these numbers publicly would potentially give some content organizations firmer ground to stand on’ in copyright disputes. This transparency could impact ongoing copyright lawsuits across the tech industry.

    Brain surgery in action
    Brain surgery in action

    The practice of keeping ablation experiments secret is not unique to Meta. Other AI companies also keep such information private. Vincent notes that revealing which training data improves AI models could lead to demands for payment from content creators. Bill Gross, CEO of ProRata, a startup working on compensating creators, argues that content creators should be paid twice: once for their data being used to train AI, and again when AI models rely on this content to answer user queries.

    The Broader Trend in AI Development

    The secrecy surrounding Meta’s ablation experiments reflects a broader trend in the AI industry away from transparency about training data. In the early days of generative AI, companies like Google and OpenAI were more open about their training data sources. Today, companies share very little about their data sources or the specifics of their training processes.

    ProRato CEO Bill Gross speaks onstage
    ProRato CEO Bill Gross speaks onstage

    The lack of transparency has disappointed many in the industry. Gross comments, ‘It’s really disappointing that they’re not being open about it, and they’re not giving credit to the material.’ The revelation of Meta’s experiments has reignited discussions about the need for a system that assigns credit to sources of training data and provides appropriate compensation.

    Conclusion

    The uncovering of Meta’s ablation experiments has significant implications for the AI industry, particularly regarding data valuation and copyright issues. As AI continues to evolve, the debate over transparency, compensation, and the use of training data is likely to intensify. Experts hope that revelations like this will lead to a more equitable system that sustains the creation of valuable content and knowledge.

    ablation experiments AI AI Ethics copyright Llama Meta training data
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    techgeekwire
    • Website

    Related Posts

    No title available in the provided content

    July 4, 2025

    No title available in the original content

    July 4, 2025

    Amazon.com, Inc. Stock Analysis and Recent News

    July 4, 2025

    No title available due to unintelligible content

    July 4, 2025

    Tech Industry Rocked by ‘Multiple Job Holder’ Controversy

    July 4, 2025

    NiaHealth Revolutionizes Proactive Healthcare with Clinician-Led Platform

    July 4, 2025
    Leave A Reply Cancel Reply

    Top Reviews
    Editors Picks

    No title available in the provided content

    July 4, 2025

    No title available in the original content

    July 4, 2025

    Amazon.com, Inc. Stock Analysis and Recent News

    July 4, 2025

    No title available due to unintelligible content

    July 4, 2025
    Advertisement
    Demo
    About Us
    About Us

    A rich source of news about the latest technologies in the world. Compiled in the most detailed and accurate manner in the fastest way globally. Please follow us to receive the earliest notification

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Our Picks

    No title available in the provided content

    July 4, 2025

    No title available in the original content

    July 4, 2025

    Amazon.com, Inc. Stock Analysis and Recent News

    July 4, 2025
    Categories
    • AI (2,693)
    • Amazon (1,056)
    • Corporation (990)
    • Crypto (1,130)
    • Digital Health Technology (1,079)
    • Event (523)
    • Microsoft (1,226)
    • New (9,561)
    • Startup (1,164)
    © 2025 TechGeekWire. Designed by TechGeekWire.
    • Home

    Type above and press Enter to search. Press Esc to cancel.