Cursor’s OpenAI-powered agent built and ran a browser for a week without human intervention. why this is important

If a team of human engineers built a web browser that only worked half as well, it wouldn’t attract attention. But when Michael Truell, CEO of programming startup Cursor, Posted on X Last week, a group of artificial intelligence agents developed a browser that, he wrote, “kind of worked” — running non-stop for a week without any human intervention — and it went viral among the tech world, with more than 6 million views.

Why is it so lively? Two big reasons: On the one hand, AI’s attention span has historically been short. In the early days of ChatGPT, models could only stay on task for a few seconds. For better models, this range can be extended to minutes and then to hours. The Cursor project claims to be the first time an artificial intelligence system has sustained a complex open software project for an entire week without human guidance.

Furthermore, a single AI agent is limited to focused small tasks. But having hundreds of agencies coordinating on a big project still seems to be a thing of the future. That’s why the cursor Want to see By having AI agents work as a team, they can push autonomous coding to a point where a human team might take months on a project. Can AI systems be durable enough and work well enough together to explore code, break work into parts, debug themselves, and persist for days without straying from the task at hand?

Artificial Intelligence Agent “Orchestra”

The researchers found that the answer was mostly yes. Cursor’s experiment orchestrates hundreds of agents into a software team-like organization. It has “planners”, “workers” and “judges” coordinating across millions of lines of code. This hints at what Cursor and OpenAI say will happen in the near future, when AI will not only help employees but take on entire projects. This will fundamentally reshape how complex work is done—first in software development, and later in other industries.

AI swarm experiments have been underway for several years. But Cursor says today’s models are smarter and can remain consistent for longer. These models can operate at larger scales, with a custom layer to coordinate hundreds of agents and prevent them from falling into chaos.

Jonas Nelle, an engineer at Cursor who works on long-running AI agents, told us wealth As AI models continue to get better, engineers and researchers will need to revisit their assumptions about the capabilities of AI models every few months. While he admitted that he “wouldn’t download it and delete Chrome today,” the browser project is “certainly better than any previous model has been able to do.”

These long-running agents are an important frontier, added OpenAI engineer Bill Chen, who stress-tests and evaluates the real-world behavior of the company’s models. The length of the task and the fact that the AI system was able to complete it autonomously and coherently is “a good indicator of how intelligent and versatile the system is,” he said. The Cursor project is powered by OpenAI’s GPT-5.2 and is “a direct result of us really continuing to push the boundaries of what our models can do.” He said longer tests would be conducted in the future.

The AI agent swarm is not ready for commercial use yet

Still, these are not production-ready systems. In addition to being buggy and incomplete, projects running large numbers of agents for days or weeks are expensive. Although prices have fallen significantly over the past year, long-running jobs with hundreds of AI agents still add to costs.

There are also safety issues. Autonomous systems raise concerns about vulnerabilities, data breaches, etc., and require many new layers of control and auditing.

But Chen said he expects something similar to be available “for widespread consumption at a modest cost” in the near future. He explained that progress has been ongoing so far, with important unlocks at every step. For now, he said, what’s exciting is that this is a real, practical example of the model’s capabilities “compared to how the model performs on academic and public evaluations and benchmarks.”

This shift has surprised even long-time AI observers. In a recent article, independent researcher Simon Willison Predictive It wouldn’t even be a surprise that by 2029 someone will build an entire web browser using mostly artificial intelligence. “Launching a new web browser is one of the most complex software projects I can imagine,” he wrote. The cursor may have sped up this timeline. “I’m probably three years behind,” Willison said. “I have to admit, I was very surprised to see such a powerful feature appear so quickly.”

This is consistent with what OpenAI and others have said “ability overhang“The most sophisticated AI models can do much more than can be deployed publicly, but the right combination of tools, product design, and cost reductions can suddenly make them available at scale. So while tools like the Cursor browser aren’t quite ready for prime time, the trajectory is clear.

Source link

Artificial Intelligence Agent “Orchestra”

The AI ​​agent swarm is not ready for commercial use yet

Leave a ReplyCancel Reply

The AI agent swarm is not ready for commercial use yet