Close Menu
Breaking News in Technology & Business – Tech Geekwire

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Tech Billionaires Back New Bank Erebor to Serve Crypto and AI Firms

    July 7, 2025

    Sharps Technology CEO to Present at Aegis Capital Corp. 2025 Virtual Conference

    July 7, 2025

    Microsoft Makes AI Usage Mandatory and Ties it to Employee Performance Reviews

    July 7, 2025
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech GeekwireBreaking News in Technology & Business – Tech Geekwire
    • New
      • Amazon
      • Digital Health Technology
      • Microsoft
      • Startup
    • AI
    • Corporation
    • Crypto
    • Event
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech Geekwire
    Home ยป Scientists Measure AI Capabilities by Task Duration
    AI

    Scientists Measure AI Capabilities by Task Duration

    techgeekwireBy techgeekwireMay 3, 2025No Comments3 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email

    Scientists have developed a new method to assess artificial intelligence (AI) capabilities by measuring how quickly they can complete challenging tasks compared to humans. While AI generally outperforms humans in text prediction and knowledge tasks, it struggles with more complex projects requiring multiple steps. A recent study published on arXiv proposes evaluating AI models based on task duration, comparing their completion time to that of humans.

    The researchers found that AI models excel at tasks taking humans under four minutes, with a near-100% success rate. However, this success rate drops to 10% for tasks requiring more than four hours. The study tested various AI models, including Sonnet 3.7, GPT-4, and Claude 3 Opus, against a range of tasks from simple Wikipedia lookups to complex programming tasks like writing CUDA kernels or debugging PyTorch code.

    To assess AI capabilities, the researchers used testing tools such as HCAST and RE-Bench. HCAST contains 189 autonomy software tasks evaluating AI agent capabilities in machine learning, cybersecurity, and software engineering, while RE-Bench uses seven challenging open-ended machine-learning research engineering tasks benchmarked against human experts. The tasks were also rated for “messiness” to assess their complexity and real-world applicability.

    The study’s findings suggest that AI’s “attention span” is advancing rapidly. By extrapolating this trend, the researchers predict that AI could automate a month’s worth of human software development by 2032. This new benchmark could help better understand AI’s actual intelligence and capabilities, providing a meaningful interpretation of absolute performance rather than just relative performance.

    Experts in the field, such as Sohrob Kazerounian and Eleanor Watson, agree that measuring AI performance based on task duration is valuable and intuitive. It directly reflects real-world complexity and captures AI’s ability to maintain coherent goal-directed behavior over time. This metric could track progress in AI development, particularly for tasks where AI is expected to solve complex human problems.

    The study’s implications extend beyond a new benchmark metric. It highlights the rapid advancement of AI systems and their increasing ability to handle lengthy tasks. Eleanor Watson predicts that by 2026, we’ll see AI becoming more general, handling varied tasks across entire days or weeks rather than short, narrowly defined assignments. This could lead to AI taking on substantial portions of professional workloads, reducing costs, and improving efficiency, allowing humans to focus on more creative and strategic tasks.

    For consumers, AI is expected to evolve from simple assistants to dependable personal managers capable of handling complex life tasks over extended periods with minimal oversight. The emergence of powerful generalist AI agents will likely reshape daily life and professional practices fundamentally.

    AI capabilities AI performance measurement Artificial Intelligence task duration
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    techgeekwire
    • Website

    Related Posts

    Tech Billionaires Back New Bank Erebor to Serve Crypto and AI Firms

    July 7, 2025

    Sharps Technology CEO to Present at Aegis Capital Corp. 2025 Virtual Conference

    July 7, 2025

    Microsoft Makes AI Usage Mandatory and Ties it to Employee Performance Reviews

    July 7, 2025

    Vietnam Legalizes Crypto Assets Under New Digital Technology Law

    July 7, 2025

    Tech Billionaires Back New Crypto-Focused Bank Erebor

    July 7, 2025

    Top Business Technology News This Week: AI, Robotics, and Advertising Updates

    July 7, 2025
    Leave A Reply Cancel Reply

    Top Reviews
    Editors Picks

    Tech Billionaires Back New Bank Erebor to Serve Crypto and AI Firms

    July 7, 2025

    Sharps Technology CEO to Present at Aegis Capital Corp. 2025 Virtual Conference

    July 7, 2025

    Microsoft Makes AI Usage Mandatory and Ties it to Employee Performance Reviews

    July 7, 2025

    Vietnam Legalizes Crypto Assets Under New Digital Technology Law

    July 7, 2025
    Advertisement
    Demo
    About Us
    About Us

    A rich source of news about the latest technologies in the world. Compiled in the most detailed and accurate manner in the fastest way globally. Please follow us to receive the earliest notification

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Our Picks

    Tech Billionaires Back New Bank Erebor to Serve Crypto and AI Firms

    July 7, 2025

    Sharps Technology CEO to Present at Aegis Capital Corp. 2025 Virtual Conference

    July 7, 2025

    Microsoft Makes AI Usage Mandatory and Ties it to Employee Performance Reviews

    July 7, 2025
    Categories
    • AI (2,700)
    • Amazon (1,059)
    • Corporation (994)
    • Crypto (1,140)
    • Digital Health Technology (1,084)
    • Event (528)
    • Microsoft (1,232)
    • New (9,604)
    • Startup (1,167)
    © 2025 TechGeekWire. Designed by TechGeekWire.
    • Home

    Type above and press Enter to search. Press Esc to cancel.