From Openai to Nvidia, researchers agree: AI agents still have a long way to go



Welcome to follow AI! AI journalist Sharon Goldman fills in here for Jeremy Kahn who is on vacation. In this version…GM Services Authority Approves OpenAi, Google, Anthropic for Federal AI Supplier List…The Consequences of the Boom in AI Spending on the U.S. EconomyClay AI raised $100 million at a valuation of $3.1 billion.

Only in the Bay Area, Saturday’s time did go to AI agents, i.e. 2,000 students, researchers and tech insiders crammed into UC Berkeley, just like a completely normal weekend plan. When I picked up the badge and watched the snake in the student union hall at the one-day agent AI summit, it didn’t feel like an academic conference, but a Silicon Valley version of the New York brunch attraction.

This is certainly due to the speaker lineup, which stacks with senior AI researchers and scientists including Openai’s chief scientist. Ed Chi, vice president of research at Google DeepMind; NVIDIA chief scientist Bill Dally; Co-founders of Databricks & Anyscale, Ion Stoica and Professor UC Berkeley; and Dawn Song, a groundbreaking professor of UC Berkeley focused on AI security.

Popularity may be due to the buzzing theme (usually defined as an AI-powered system) that can use other software tools to accomplish tasks. I think it is recommended not only to make a vacation itinerary, but also to book a flight and make a hotel reservation.

As my colleague Jeremy Kahn said In recent articles“This automation is a perennial C-Suite Fever dream. Over the past decade, companies have accepted ‘robot process automation’ or RPA. This is the software that can be used Automatically repeat taskssuch as cutting and pasting between database programs. However, traditional RPA systems are inflexible and cannot handle exceptions, and usually can only handle a narrow task. “Agent AI could have been more flexible and powerful to adapt to business needs.

In January 2025 Blog Posts“We believe that in 2025, we may see the first AI agent “joining the workforce” and essentially changing the company’s output,” said Openai CEO Sam Altman.

But despite the hype, the overall message at the Agesic AI Summit is cautious and rooted: Agents may be the hottest trend in AI, but technology still has a long way to go. Unfortunately, AI agents are not always reliable. They may not remember what happened before.

For example, Google DeepMind’s Chi highlights the gap that agents can do in planning demonstrations and still need to be in real-world production environments. Pachocki highlights concerns about the security, security and credibility of proxy systems, especially when they are integrated into sensitive applications or run automatically.

“I still think agents really don’t deliver on their promises,” said Sherwin Wu, director of engineering at OpenAI API. “Some more general cases have worked, but my day-to-day work isn’t really different from the agent.”

While today’s agents may not be in line with massive hype right now (please consider Salesforce CEO Marc Benioff Recent Propositions The shift to a digital workforce means he will be “the last CEO of Salesforce, who manages humanity only”, and there is still a lot of optimism for the spokesperson at the Agentiic AI summit. Databricks’ Stoica expressed enthusiasm for infrastructure improvements, which made building proxy systems easier. NVIDIA’s Dally proposes that continued hardware advancements will enable more powerful and more efficient proxy behavior. Some point to “narrow victory” in specific areas, such as coding.

Today’s AI agents may still be adding to pain, but given the crowded UC Berkeley ballroom, the industry has been focusing on the prize: AI agents can operate reliably in the real world. They believe that the reward will be well worth the wait.

In this way, this is more AI news.

Salon Goldman
sharon.goldman@fortune.com
@sharongoldman

AI in the news

US agencies approve anthropomorphism of OpenAI, Google, Federal AI vendor list. Reuters report Today, the General Services Administration, the U.S. government’s central procurement division, added Openai’s Chatgpt, Google’s Gemini and Anthropic’s Claude list to accelerate the use of technology by government agencies. These tools will be used to agents through a platform with contractual terms. The approved AI providers are “committed to being responsible for use and complying with federal standards,” the GSA said.

The boom in AI spending could have a real impact on the U.S. economy. according to Washington PostBig Tech’s record investment in artificial intelligence ($350 billion received from Google, Meta, Amazon and Microsoft this year) has become a major economic force, while job growth is cooling, this massive AI spending spree is fueling construction of data centers and driving demand for chips, servers, and networking gear—potentially boosting GDP growth by up to 0.7% in 2025. But economics warn the growing reliance on tech giants to prop up the economy is risky: if the AI boom loses steam, the economic fallout could be significant.

AI sales tool Clay raised $100 million at a valuation of $3.1 billion. this New York Times Transaction Book Clay reportedly helped sales reps and marketers find new prospects and turn them into customers, and it has raised $100 million at a $3.1 billion valuation. The round was led by Capitalg, the investment arm of Google’s parent company Alphabet. Other participants include Meritech Capital Partners and Sequoia Capital. The startup raised funds at a valuation of $1.25 billion about six months later.

Focus on AI research

Google DeepMind’s new Genie 3 “World Model” creates real-time interactive simulations. Google DeepMind unveils Genie 3, a powerful new AI system that generates rich interactive virtual worlds from simple text prompts, allowing real-time navigation of dynamic environments at 24 frames per second. But while it’s easy to leap right away with the model for the ultimate gaming experience, it’s actually the latest leap in the company’s long-term push for the “world model” or AI systems that can learn how the world works and simulate real-world environments. These are seen as key to training senior agents and ultimately obtaining artificial general intelligence. Unlike previous video generators, Genie 3 allows users to visually maintain a consistent AI-generated environment for minutes, even responding to commands like “Make It Snow” or “Add Role”. For now, DeepMind restricts access to Genie 3 to a small group of researchers and creators while exploring responsible deployments and risks.

We have wealth

In the past 12 months – Amanda Gerut

AI is now undergoing job interviews, but candidates say they would rather lose their job than talk to another bot – Emma Burleigh

These charts show how China is ahead of the United States in a race that powers the future of AI – Matt Heimer and Nick Rapp

You have a calendar

September 8-10: FENTUNE Brainstorming Technology, Park City, Utah. Apply to participate here.

October 6th to 10th: world AI One week, Amsterdam

October 21-22: Thai-era San Francisco. Apply to participate here.

December 2-7: Nervous, San Diego

December 8-9: Wealth Brainstorm AI San Francisco. Apply to participate here.

Brain Food

Can “deep thought” be the key to AI reasoning?

A tiny new AI model is challenging our understanding of how models are learned: Singapore Smart Intelligence researchers recently released Hierarchical inference model (HRM) draws inspiration from the thinking process of brain layering – the result has left the AI community endlessly. Although only 100 times smaller than Chatgpt and only trained for 1,000 examples (no internet data or step-by-step guidance), HRM solves difficult logic problems such as Sudoku, Maze navigation and abstract reasoning tasks that encounter larger models. The internal reason HRM does not imitate human language, but works through problems in hidden loops, just like a person who thinks in his mind. Its success suggests a complete shift in artificial intelligence: depth of thought may be more important than scale.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *