Close Menu
Breaking News in Technology & Business – Tech Geekwire

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Tech Industry Pushback on ‘Smart Deregulation’ Proposal

    May 11, 2025

    DXC Launches DXC Complete: Simplifying SAP Modernization with Microsoft Azure

    May 11, 2025

    OpenAI in Talks with Microsoft for New Funding and Future IPO

    May 11, 2025
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech GeekwireBreaking News in Technology & Business – Tech Geekwire
    • New
      • Amazon
      • Digital Health Technology
      • Microsoft
      • Startup
    • AI
    • Corporation
    • Crypto
    • Event
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech Geekwire
    Home ยป New Study Accuses Popular AI Benchmark of Favoring Big Tech Companies
    AI

    New Study Accuses Popular AI Benchmark of Favoring Big Tech Companies

    techgeekwireBy techgeekwireMay 8, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email

    A recent paper from researchers at Cohere, Stanford, MIT, and Ai2 has accused LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of allowing certain major AI companies to gain an unfair advantage in leaderboard rankings. The study alleges that companies like Meta, OpenAI, Google, and Amazon were permitted to privately test multiple variants of their AI models on the platform, then selectively publish only the highest-scoring models.

    The Controversy

    Chatbot Arena, created in 2023 as an academic research project out of UC Berkeley, has become a significant benchmark for AI companies. It operates by pitting answers from two different AI models against each other in a “battle,” with users voting on which response is better. The votes contribute to a model’s score and its placement on the leaderboard. The researchers analyzed over 2.8 million Chatbot Arena “battles” between November 2024 and March 2025, finding evidence that certain major AI companies were allowed to collect more data through increased “battle” participation.

    A chart pulled from the study showing the distribution of model performances
    A chart pulled from the study showing the distribution of model performances

    The authors claim this preferential access gave these companies a significant advantage. For instance, Meta was allegedly able to privately test 27 model variants between January and March leading up to its Llama 4 release, ultimately publishing only the score of a single top-performing model. Sara Hooker, Cohere’s VP of AI research and co-author of the study, described this practice as “gamification.”

    Response from LM Arena

    LM Arena Co-Founder and UC Berkeley Professor Ion Stoica disputed the findings, calling the study “full of inaccuracies” and “questionable analysis.” The organization maintained its commitment to “fair, community-driven evaluations” and invited all model providers to submit more models for testing. LM Arena argued that allowing model providers to choose how many tests to submit does not constitute unfair treatment.

    Implications and Recommendations

    The researchers recommend several changes to make Chatbot Arena more fair, including setting a clear limit on private tests and publicly disclosing scores from these tests. They also suggest adjusting the sampling rate to ensure all models appear in an equal number of “battles.” While LM Arena has rejected some of these suggestions, it has indicated plans to implement a new sampling algorithm.

    The controversy comes as Meta was recently caught optimizing a Llama 4 model for “conversationality” to achieve a high score on Chatbot Arena, without releasing the optimized model. This incident, combined with the new study, increases scrutiny on private benchmark organizations and their potential susceptibility to corporate influence.

    AI benchmarking Bias Chatbot Arena tech giants
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    techgeekwire
    • Website

    Related Posts

    Tech Industry Pushback on ‘Smart Deregulation’ Proposal

    May 11, 2025

    DXC Launches DXC Complete: Simplifying SAP Modernization with Microsoft Azure

    May 11, 2025

    OpenAI in Talks with Microsoft for New Funding and Future IPO

    May 11, 2025

    PAR Technology Corporation Reports Q1 2025 Earnings Results

    May 11, 2025

    The Crypto Entrepreneurs Dreaming of Creating New Countries

    May 11, 2025

    Deerfield Launches $600 Million Healthcare Innovations Fund III to Advance Therapeutics and Healthcare Technology

    May 11, 2025
    Leave A Reply Cancel Reply

    Top Reviews
    Editors Picks

    Tech Industry Pushback on ‘Smart Deregulation’ Proposal

    May 11, 2025

    DXC Launches DXC Complete: Simplifying SAP Modernization with Microsoft Azure

    May 11, 2025

    OpenAI in Talks with Microsoft for New Funding and Future IPO

    May 11, 2025

    PAR Technology Corporation Reports Q1 2025 Earnings Results

    May 11, 2025
    Advertisement
    Demo
    About Us
    About Us

    A rich source of news about the latest technologies in the world. Compiled in the most detailed and accurate manner in the fastest way globally. Please follow us to receive the earliest notification

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Our Picks

    Tech Industry Pushback on ‘Smart Deregulation’ Proposal

    May 11, 2025

    DXC Launches DXC Complete: Simplifying SAP Modernization with Microsoft Azure

    May 11, 2025

    OpenAI in Talks with Microsoft for New Funding and Future IPO

    May 11, 2025
    Categories
    • AI (1,970)
    • Amazon (794)
    • Corporation (751)
    • Crypto (875)
    • Digital Health Technology (790)
    • Event (420)
    • Microsoft (950)
    • New (7,064)
    • Startup (814)
    © 2025 TechGeekWire. Designed by TechGeekWire.
    • Home

    Type above and press Enter to search. Press Esc to cancel.