The new coding challenge only publishes the first results – and they are not good

The challenge coding AI new has announced the first champion – and set the new bar for software engineer engineer-powered software.

Days Wednesday, the 5pm PST, the Institute Laude Nonfrofit announced the first winner of K, which tends to multi-circle coding to the databricks and the founder of Andy Konwinski. The winner is a Brazilian Prompt engineering named Eduardo Rocha de Andrade, who will receive $ 50,000 for the gift. But it is surprised than to win as her last score: he won with the correct answers for only 7.5% of the questions in the test.

“We are happy, we have made the right benchmark,” Konawinski said. “The benchmark should be difficult to be a problem,” he continued, add: “Score will be different if the lab

Konwinski has promised $ 1 million to the first oven model that can score higher than 90% on the test.

The same as a very good bench system, a well-known test model for a dating problem from GitHub as a test by the way that models can solve the real programs. But when the bench bench is based on a fixed problem, the model that can be treated, “the colon-free version” contamination – using an entry system to be kept against certain training. For one, model because of March 12. The gift gift can then build a test by using only a github github issue after the date.

7.5% of the scores are in the contrast to be checked in a good contrast, which now shows the top 75% score on the ‘Verified’ test is easier ‘. Konwinski is still unsure of what is not there for a good contamination or just a challenge to collect new problems from GitHub, but they look forward to my question.

“As much more warranty,” she told TechCrunch, “because we hope people adapt to the dynamics that compete in this month.”

TechCrunch Events

San Francisco
|
October 27, 2025

Look like a strange place for the odd spring, because there are various coding AI tools available – but with a lot of benchmarks, many critics see projects like to complete AI Evaluation Issues GrowSee rankings-.

“I just arrived about building a new test for a benchmark existing,” Princeton said Kapoor Research, which gives the same idea On the new paperSee rankings-. “Without the experiment, we could not know what the problem was contaminated, or just a leaderboard a leaderboard with a man in a loop.”

For Konwinski, it’s not just a better benchmark, but the challenges are open to the industries. “If you listen to hype, like we have to see AI and AI infections and AI software enginey, and not only,” he said. “If you can’t get more than 10% on the free Bre-bench contamination, then the fact that checks for me.”

Source link

Leave a ReplyCancel Reply