
OpenAI, facing increasing competitive pressure from Google and Anthropic, has launched a new AI model, GPT-5.2, which it says beats all existing models by a wide margin across a wide range of tasks.
The new model, released less than a month after OpenAI launched its predecessor GPT-5.1, performed particularly well on benchmarks for complex professional tasks across a range of “knowledge work” from law to accounting to finance, as well as assessments involving coding and mathematical reasoning, according to data released by OpenAI.
seamus ransomThe former InstaCart CEO, now CEO of OpenAI Apps, told reporters that the model should not be seen as a direct response to the Gemini 3 Pro AI model Google released last month. The release prompted OpenAI CEO Sam Altman Issue “Code Red” Delayed the launch of several initiatives in order to focus more staff and computing resources on improving its core product ChatGPT.
“I would say (Code Red) helped with the release of the model, but that’s not why it was released this week, it’s been brewing for a while,” she said.
She said the company has been developing GPT-5.2 “for months.” “We’re not going to turn these models around in just one week. This is the result of a lot of work,” she said. According to reports, the model is internally codenamed “Garlic” a story in the message. A day before the model was released, Ultraman teased its imminent launch by posting a video clip of him cooking a dish with lots of garlic on social media.
OpenAI executives said the model had been in the hands of “alpha customers” who helped test its performance for “several weeks,” meaning the model was completed before Altman announced “Code Red.”
These testers include legal AI startup Harvey, note-taking app Notion, and document management software company Boxalso Shopping and skyrocketing.
OpenAI said these customers found that GPT-5.2 demonstrated “state-of-the-art” capabilities in using other software tools to complete tasks and was good at writing and debugging code.
Coding has become one of the most competitive use cases for AI model deployment within companies. Although OpenAI leads the field, Anthropic’s Claude model has proven particularly popular among enterprises, surpassing OpenAI in market share according to some data. OpenAI is undoubtedly hoping to convince customers to return to its models encoded in GPT-5.2.
Simo said Code Red is helping OpenAI focus on improving ChatGPT. “A code red is really a signal to the company that we want to consolidate resources in a particular area, and it’s a way to really define priorities and define things that can be de-prioritized,” she said. “As a result, we have increased resources focused on ChatGPT overall.”
The company also said its new model is better than the company’s earlier models at providing “safe completion” – which is defined as providing users with helpful answers while not saying things that could cause or worsen a mental health crisis.
“On the safety side, as you can see through the benchmarks, we’re improving in almost every aspect of safety, whether it’s self-harm, whether it’s different types of mental health, whether it’s emotional dependency,” Simo said. “We are very proud of the work we do here. This is our top priority and we will only release a model when we are confident that safety protocols have been followed and we are proud of our work.”
New models released on the same day new lawsuit A lawsuit was filed against the company, alleging that ChatGPT’s interactions with a user suffering from psychological issues led to a murder-suicide in Connecticut. The company has also faced several other lawsuits alleging that ChatGPT caused people to commit suicide. The company called the Connecticut murder-suicide “incredibly heartbreaking” and said it is continuing to improve “ChatGPT’s training to recognize and respond to signs of mental or emotional distress, de-escalate conversations and guide people to real-world support.”
GPT-5.2 has shown significant jumps in performance on multiple benchmarks of interest to enterprise customers. As measured by OpenAI, it meets or exceeds the performance of human experts on a variety of difficult professional tasks GDP value Baseline, 70.9% of the time. In comparison, the success rate of the GPT-5 model released by OpenAI in August was only 38.8%; Anthropic’s Claude Opus 4.5 was 59.6%; and Google’s Gemini 3 Pro was 53.3%.
On the software development benchmark SWE-Bench Pro, GPT-5.2 scored 55.6%, which is nearly 5 percentage points higher than its predecessor GPT-5.1 and more than 12% higher than Gemini 3 Pro.
Aidan Clark, OpenAI’s vice president of research (training), declined to answer questions about exactly which training methods were used to upgrade GPT-5.2 performance, although he said the company has made improvements across the board, including in pre-training, the first step in creating an AI model.
When Google released the Gemini 3 Pro model last month, its researchers also said the company had made improvements both before and after training. This surprised some in the industry, who believed that AI companies had largely exhausted their ability to derive substantial improvements from the pre-training stage of model building, and speculated that OpenAI may have been caught off guard by Google’s progress in this area.

