OpenAI introduced a new general-purpose AI agent within ChatGPT, designed to perform a wide range of computer-based tasks for users. The company says this agent can manage calendars, create editable presentations and slideshows, and execute code automatically.
Called ChatGPT agent, the tool merges features from OpenAI’s previous agents—combining Operator’s ability to navigate websites with Deep Research’s skill at synthesizing information from multiple sources into concise reports. Users can interact with the agent simply by using natural language prompts.
Starting Thursday, ChatGPT agent is available to subscribers on OpenAI’s Pro, Plus, and Team plans and can be activated by selecting “agent mode” from ChatGPT’s tool menu.
This launch marks OpenAI’s most ambitious effort to transform ChatGPT into an agentic product that can act on behalf of users and handle tasks—not just answer questions. While other Silicon Valley firms like Google and Perplexity have introduced AI agents with similar goals, early versions struggled with complex tasks and lacked product appeal compared to the vision for AI agents.
OpenAI claims ChatGPT agent is significantly more capable than its predecessors.
The new agent can access ChatGPT connectors, enabling integration with apps like Gmail and GitHub to retrieve relevant information. It also has terminal access and can use APIs to connect with certain applications.
OpenAI suggests users can have ChatGPT agent “plan and buy ingredients to make Japanese breakfast for four,” or “analyze three competitors and create a slide deck.” These tasks require parsing websites, planning steps, and using tools—challenges OpenAI hadn’t tackled in prior agents. According to the company, the model behind ChatGPT agent achieves state-of-the-art results on several benchmarks.
ChatGPT agent scored 41.6% on Humanity’s Last Exam (pass@1), a difficult test spanning thousands of questions across over 100 subjects—roughly double the score of OpenAI’s o3 and o4-mini models.
On FrontierMath, one of the toughest math benchmarks, ChatGPT agent scores 27.4% with tool access like a code-executing terminal. The previous top score was 6.3% by o4-mini.
OpenAI developed ChatGPT agent with safety as a priority, recognizing the product’s enhanced capabilities could be risky if misused. The company previously warned that agentic models might have increased potential for harm.
In its safety report, OpenAI labels ChatGPT agent as “high capability” in biological and chemical weapon domains, defined as a model capable of “amplify[ing] existing pathways to severe harm.” Though lacking direct evidence, OpenAI is taking a precautionary approach, activating safeguards to mitigate these risks.
These include a real-time monitor analyzing every prompt to detect biology-related content, followed by a secondary check to flag potential biological threats.
Additionally, OpenAI disabled ChatGPT’s memory feature for this agent to prevent misuse. Normally, the memory allows the chatbot to recall previous user chats, but in this context, it could enable prompt injection attacks to extract sensitive data. The company may reconsider enabling memory later.
While ChatGPT agent sounds promising, its true effectiveness in real-world scenarios remains unproven, as agent technology has historically been fragile in practice. Nevertheless, OpenAI asserts it has built a more capable model ready to fulfill the AI agent promise.