OpenAI's agent tool may be close to release

[ad_1]

OpenAI may be about to release an AI tool that can take control of your PC and perform actions on your behalf.

Tibor Blaho, a software engineer with a reputation for pinpointing upcoming AI products, claims to have uncovered evidence of the long-rumored OpenAI Operator tool Publications including Bloomberg had it first reported on the operator, which is said to be a “agent“system capable of autonomously managing activities such as writing code and booking travel.

Second to Information, OpenAI is intended for January as the month of release of the operator. The code discovered by Blaho this weekend adds credence to that report.

OpenAI ChatGPT the client for macOS has gained options, hidden for now, to define shortcuts to “Toggle Operator” and “Force Quit Operator”, for Blaho. And OpenAI has added references to the Operator on its website, Blaho said — though they’re not yet publicly visible.

The OpenAI site already has references to the Operator / OpenAI CUA (Computer User Agent) – “Operator System Map Table”, “Operator Research Evaluation Table” and “Table of operator rejection rates”

Including comparisons to Claude 3.5 Sonnet Computer use, Google Mariner, etc.

(preview of the tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

According to Blaho, the OpenAI site also contains tables that are not public comparing the performance of the Operator to other AI systems that use computers. The tables can be well placed. But if the numbers are accurate, they suggest that the operator is not 100% reliable, depending on the task.

The OpenAI site already has references to the Operator / OpenAI CUA (Computer User Agent) – “Operator System Map Table”, “Operator Research Evaluation Table” and “Table of operator rejection rates”

Including comparisons to Claude 3.5 Sonnet Computer use, Google Mariner, etc.

(preview of the tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

On OSWorld, a benchmark that tries to mimic a real computing environment, “OpenAI Computer Use Agent (CUA)” – possibly the Operator that powers the AI model – scores 38.1%, ahead of Anthropic. computer control model but very short of the 72.4% of the man’s point. OpenAI CUA outperforms human performance in WebVoyager, which assesses AI’s ability to navigate and interact with websites. But the model falls short of human-level scores on another web-based benchmark, WebArena, according to leaked benchmarks.

The operator also struggles with tasks that a human could easily do, if the leak is to be believed. In a test that tasked the Operator with signing up with a cloud provider and launching a virtual machine, the Operator succeeded only 60% of the time. Tasked with creating a Bitcoin wallet, the operator succeeded only 10% of the time.

OpenAI’s upcoming entry into the AI agent space comes as rivals including the aforementioned Anthropic, Googleand others make plays for the nascent segment. AI agents can be risky and speculativebut the tech giants are already catching on as well next big thing in AI. Second according to the analysis company Markets and Markets, the market for AI agents could be worth $47.1 billion by 2030.

Agents today are rather primitive. But some experts have raised concerns about their safety if the technology improves quickly.

One of the leaked charts shows that the operator performs well on selected security assessments, including tests that look for the system to perform “illegal activities” and search for “sensitive personal data.” It is saidsecurity testing is among the reasons for the long development cycle of the Operator. In a recent X placeOpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing an agent he claims lacks security mitigations.

“I can only imagine the negative reactions if OpenAI made a similar version,” Zaremba wrote.

It should be noted that OpenAI has been criticized by AI researchers, including ex-staff, for allegedly de-emphasizing security work in favor of quickly producing their technology.

[ad_2]

Source link

Related Posts

New Study Reveals Unexpected Results from AI Weather Tools

Understanding the AI-Powered Economy for Small Businesses in 2026

Embassy: Essential Rust Framework for Embedded Systems in 2024