OpenAI Launches New Tools to Boost AI Agent Development
OpenAI has unveiled a suite of new tools designed to help developers and businesses enhance AI capabilities by building AI agents—automated systems capable of independently completing tasks. These tools, part of OpenAI’s new Responses API, allow companies to create AI agents that can perform web searches, scan company databases, and navigate websites, mirroring OpenAI’s Operator product.
The Responses API replaces the company’s previous Assistants API, which OpenAI plans to phase out by mid-2026. This change indicates OpenAI’s increasing focus on commercializing AI agent technology and encouraging greater adoption by developers. The term “AI agents” has gained considerable attention in the tech industry in recent years, even though defining and demonstrating their capabilities have proven difficult. This week, Chinese startup Butterfly Effect faced criticism after its AI agent platform, Manus, failed to meet user expectations, which puts increased pressure on OpenAI to deliver AI agents that are both functional and reliable.
“It’s pretty easy to demo your agent,” said Olivier Godement, OpenAI’s head of API products, in an interview with TechCrunch. “To scale an agent is pretty hard, and to get people to use it often is very hard.”
OpenAI’s existing AI agents include Operator, which aids users in website navigation, and deep research, which compiles research reports. Although those tools have hinted at AI agents’ potential, their autonomy has been limited. The Responses API aims to build on those foundations by allowing developers to create their own versions of Operator and deep research, thus potentially increasing the agents’ autonomy and utility.
The Responses API gives developers access to GPT-4o search and GPT-4o mini search models, which power ChatGPT’s web search feature. These real-time web browsing models provide answers with cited sources. OpenAI claims that these models outperform previous iterations in factual accuracy. For instance, GPT-4o search scored 90% on OpenAI’s SimpleQA benchmark, compared to 88% for GPT-4o mini search and 63% for the larger GPT-4.5 model. The API also provides file search functionality, which allows businesses to quickly scan internal documents.
OpenAI has assured users that its models will not be trained on customer data. Additionally, developers can now access OpenAI’s Computer-Using Agent (CUA) model, which powers Operator. This model can generate keyboard and mouse actions, thus automating tasks such as data entry and other app workflows. OpenAI offers enterprises the option to run the CUA model locally within their systems, while the consumer version of CUA is restricted to web-based actions.
Despite its advancements, the Responses API doesn’t solve all the difficulties related to AI agents. Web search models, for example, while more accurate than traditional AI models, are still prone to making errors. For example, GPT-4o search misanswers 10% of factual queries. AI-driven search tools also experience difficulties with short, navigational queries like “Lakers score today,” and some reports suggest that ChatGPT’s citations aren’t always reliable. OpenAI acknowledges that its CUA model isn’t highly reliable for automating tasks on operating systems at this point, although the company adds that it’s actively improving the technology.
To further support developers, OpenAI is also releasing the Agents SDK, an open-source toolkit designed to help integrate AI agents with internal systems, monitor performance, and implement safeguards. This toolkit builds on OpenAI’s Swarm framework, released in late 2023 for multi-agent orchestration. OpenAI sees AI agents as the next major evolution in artificial intelligence. CEO Sam Altman has predicted that 2025 will be the year AI agents are integrated into the workforce. Godement echoed Altman’s optimism, observing, “Agents are the most impactful application of AI that will happen.” Through the Responses API, OpenAI is shifting its focus from AI agent demonstrations to the delivery of real-world tools that enterprises and developers can deploy at scale.