Close Menu
Breaking News in Technology & Business – Tech Geekwire

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Crawford County, Pa. to Use AI to Review 911 Response Quality

    July 5, 2025

    The Rise of Small Language Models: Enhancing AI Efficiency and ROI

    July 5, 2025

    CMS Announces 6-Year Prior Authorization Program Pilot

    July 5, 2025
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech GeekwireBreaking News in Technology & Business – Tech Geekwire
    • New
      • Amazon
      • Digital Health Technology
      • Microsoft
      • Startup
    • AI
    • Corporation
    • Crypto
    • Event
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech Geekwire
    Home » New Metric Tracks Impressive Advancement of AI Models
    AI

    New Metric Tracks Impressive Advancement of AI Models

    techgeekwireBy techgeekwireMarch 27, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email

    Today’s AI systems are rapidly improving and could soon bridge the gap with human capabilities on complex tasks, according to a new analysis. A non-profit organization, METR, created a metric to assess the progress of AI models, providing a clearer picture of their advancements.

    A new metric assesses the progress of AI models
    A new metric assesses the progress of AI models

    METR, based in Berkeley, California, developed nearly 170 real-world tasks across coding, cybersecurity, general reasoning, and machine learning. They established a ‘human baseline’ by measuring the time expert programmers took to complete these tasks. The team’s new metric, called ‘task-completion time horizon,’ assesses how long it takes programmers to complete tasks with the same success rate as AI models.

    In a recent preprint on arXiv, METR reported that GPT-2, an early large language model (LLM) from OpenAI (2019), failed on all tasks that took human experts over a minute. In contrast, Anthropic’s Claude 3.7 Sonnet, released earlier this year, completed half the tasks that would take humans 59 minutes. The study found that the time horizon of the 13 leading AI models has roughly doubled every seven months since 2019. Notably, this growth accelerated in 2024, with the latest models improving at a rate of doubling every three months.

    METR suggests that AI models could handle tasks taking humans about a month at 50% reliability by 2029, possibly sooner, based on the current progress. One month of dedicated human expertise, the paper notes, can be enough to start a new company or make scientific discoveries, for instance.

    Professor Joshua Gans of the University of Toronto, known for his work on the economics of AI, cautioned against putting too much weight on these types of predictions. “Extrapolations are tempting to do, but there is still so much we don’t know about how AI will actually be used for these to be meaningful,” he said.

    Co-author Lawrence Chan explained that the 50% success rate was chosen because it was the most robust to slight variations in data. Raising the threshold to 80% significantly reduced the average time horizon, although the overall growth trends remained similar.

    Source: T. Kwa et al. Preprint at arXiv
    Source: T. Kwa et al. Preprint at arXiv

    Improvements over the past five years in the general capabilities of LLMs are largely tied to increases in scale, encompassing data, training time, and model parameters. The paper credits advancements in the time horizon metric mainly to AI model improvements in logical reasoning, tool use, error correction, and self-awareness in task execution.

    METR’s approach addresses the limitations of existing AI benchmarks that loosely map to real-world work and quickly plateau as models improve. Co-author Ben West noted that this new method provides a continuous measure that better captures meaningful progress. He also pointed out that despite achieving superhuman performance on many benchmarks, leading AI models haven’t yet had a significant economic impact, because the best models currently have a time horizon of around 40 minutes. However, Anton Troynikov, an AI researcher and entrepreneur in San Francisco, suggests that AI would have more economic impact if organizations were more willing to experiment and invest in leveraging the models effectively.

    AI Artificial Intelligence benchmarks LLM Machine Learning METR
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    techgeekwire
    • Website

    Related Posts

    Crawford County, Pa. to Use AI to Review 911 Response Quality

    July 5, 2025

    The Rise of Small Language Models: Enhancing AI Efficiency and ROI

    July 5, 2025

    CMS Announces 6-Year Prior Authorization Program Pilot

    July 5, 2025

    Best Buy Sells Health Tech Startup Current Health

    July 5, 2025

    Modernizing Government through Technology and Institutional Design

    July 5, 2025

    Proposed ‘Frontier Valley’ Tech Zone Planned Near San Francisco

    July 5, 2025
    Leave A Reply Cancel Reply

    Top Reviews
    Editors Picks

    Crawford County, Pa. to Use AI to Review 911 Response Quality

    July 5, 2025

    The Rise of Small Language Models: Enhancing AI Efficiency and ROI

    July 5, 2025

    CMS Announces 6-Year Prior Authorization Program Pilot

    July 5, 2025

    Best Buy Sells Health Tech Startup Current Health

    July 5, 2025
    Advertisement
    Demo
    About Us
    About Us

    A rich source of news about the latest technologies in the world. Compiled in the most detailed and accurate manner in the fastest way globally. Please follow us to receive the earliest notification

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Our Picks

    Crawford County, Pa. to Use AI to Review 911 Response Quality

    July 5, 2025

    The Rise of Small Language Models: Enhancing AI Efficiency and ROI

    July 5, 2025

    CMS Announces 6-Year Prior Authorization Program Pilot

    July 5, 2025
    Categories
    • AI (2,700)
    • Amazon (1,056)
    • Corporation (991)
    • Crypto (1,132)
    • Digital Health Technology (1,082)
    • Event (526)
    • Microsoft (1,230)
    • New (9,584)
    • Startup (1,167)
    © 2025 TechGeekWire. Designed by TechGeekWire.
    • Home

    Type above and press Enter to search. Press Esc to cancel.