Close Menu
Breaking News in Technology & Business – Tech Geekwire

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech GeekwireBreaking News in Technology & Business – Tech Geekwire
    • New
      • Amazon
      • Digital Health Technology
      • Microsoft
      • Startup
    • AI
    • Corporation
    • Crypto
    • Event
    Facebook X (Twitter) Instagram
    Breaking News in Technology & Business – Tech Geekwire
    Home » OmniParser V2: Enhancing GUI Automation with Improved Accuracy and Speed
    Microsoft

    OmniParser V2: Enhancing GUI Automation with Improved Accuracy and Speed

    techgeekwireBy techgeekwireMarch 2, 2025No Comments2 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn Tumblr WhatsApp Email
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email

    OmniParser V2: Revolutionizing GUI Automation

    GUI automation relies on agents capable of understanding and interacting with user interfaces. However, general-purpose large language models (LLMs) often struggle with this task. Two major hurdles include reliably identifying interactable icons and accurately interpreting the semantics of on-screen elements to associate them with the correct actions.

    OmniParser addresses these challenges by ‘tokenizing’ UI screenshots, transforming pixel data into a structure that LLMs can understand. This enables LLMs to predict the next action based on parsed, interactable elements. Building on the original, OmniParser V2 offers substantial improvements.

    OmniParser V2 excels by providing increased accuracy in detecting even the smallest interactive elements and by offering dramatically faster inference times, making it a highly efficient tool for GUI automation. These advancements are a result of training with a larger dataset for interactive element detection and functional icon captions. By decreasing the icon caption model’s image size, OmniParser V2 boasts a 60% latency reduction compared to its predecessor.

    Notably, OmniParser, when paired with GPT-4o, achieved a state-of-the-art average accuracy of 39.6 on ScreenSpot Pro, a recently released grounding benchmark featuring high-resolution screens and tiny target icons. This is a large leap from GPT-4o’s previous score of 0.8.

    ScreenSpot Pro Performance
    ScreenSpot Pro Performance

    To facilitate rapid experimentation with different agent settings, the team created OmniTool, a Dockerized Windows system equipped with essential tools for agents. Out of the box, OmniParser can be used with various cutting-edge LLMs: OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL), and Anthropic (Sonnet). This integration brings together screen understanding, grounding, action planning, and execution capabilities.

    Addressing Risks and Promoting Responsible AI

    In alignment with Microsoft’s AI principles and Responsible AI practices, risk mitigation is a priority. The icon caption model is trained using Responsible AI data to minimize the model’s potential to infer sensitive attributes (e.g., race, religion) of individuals present in icon images. Users are also encouraged to only apply OmniParser to screenshots that do not contain harmful content.

    For OmniTool, a threat model analysis was conducted using the Microsoft Threat Modeling Tool. The team provides a sandbox Docker container, safety guidance, and examples in the project’s GitHub repository. Guidance recommends maintaining human oversight to further minimize risks.

    Research Areas

    • Artificial Intelligence
    • Computer Vision
    AI computer vision GUI Automation LLMs OmniParser
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    techgeekwire
    • Website

    Related Posts

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025

    Andreessen Horowitz Backs Controversial Startup Cluely Despite ‘Rage-Bait’ Marketing

    July 4, 2025

    Invesco QQQ ETF Hits All-Time High as Tech Stocks Continue to Soar

    July 4, 2025

    ContractPodAi Partners with Microsoft to Advance Legal AI Automation

    July 4, 2025
    Leave A Reply Cancel Reply

    Top Reviews
    Editors Picks

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025

    Andreessen Horowitz Backs Controversial Startup Cluely Despite ‘Rage-Bait’ Marketing

    July 4, 2025
    Advertisement
    Demo
    About Us
    About Us

    A rich source of news about the latest technologies in the world. Compiled in the most detailed and accurate manner in the fastest way globally. Please follow us to receive the earliest notification

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Our Picks

    IEEE Spectrum: Flagship Publication of the IEEE

    July 4, 2025

    GOP Opposition Mounts Against AI Provision in Reconciliation Bill

    July 4, 2025

    Navigation Help

    July 4, 2025
    Categories
    • AI (2,696)
    • Amazon (1,056)
    • Corporation (990)
    • Crypto (1,130)
    • Digital Health Technology (1,079)
    • Event (523)
    • Microsoft (1,230)
    • New (9,568)
    • Startup (1,164)
    © 2025 TechGeekWire. Designed by TechGeekWire.
    • Home

    Type above and press Enter to search. Press Esc to cancel.