A new coding challenge AI has only published their first results – and are not beautiful

A new coding challenge ai revealed his first winner – and set a new bar for the A-Powered Program engineers.

The wonderful at 5pm PST, Unexpected Institute announced the first winn, a Multi-rounded Adadian and Perlemi and Perluxity Perplexity. The winner was an engineer engineer called Edu Sochered of Andrade, who will receive $ 50,000 for the prize. But more surprising that victory was his final score: he won with correct answers for only 7.5% of the questions about the test.

“We are happy we built a benchmark that is actually hard,” said Konwinski. “Benchmarks have to be hard to go to the matter.” Konwinski has engaged $ 1 million to the first open pattern that can score higher than 90% on the test.

Similarly to the well-known system, k of the Github models as a test of proper issues can handle programming problems in the whole world. But as long as it is based on a fixed sound of trouble. For the pattern of the patternships of the patternships of the priests using only the GitHub shotguns of Github flagged after that date.

The 9.5% score lies in the contrasts marked for the bank’s “full” proof is due to the pollution is due to the challenge of collecting new troubleshoot, but is waiting for the pic of the Project.

“, As we racing the more racing and we will have a suenu, we will tell you a tagrone:” Why are we looking for people that people fit this few months. ‘

Could look like a weird place to fall, give the wide codipal gamenti – but with benchmarks become too easy as a necessary step The trouble of raising rise to ai. I am

Techcrunch event

San Francesco
| 0.
October 27-29, 2025

“I am bullish about the build tests for existing benchmarks,” says Princetoner Paragror Kapoor, who features a similar idea In a recent card. I am “Without tests experiments, we can’t tell you if the problem is to pollution, or also only mixture to the head of Sw-beav with a human in cicado.”

For Konwinski, it is not just a better benchmark but a challenge open to the rest of the industry. “If you hear the hype, it’s seeing the doctors ai and the lawyers ai and engineers of the program Ai, and it isn’t true”, it says. “If we can’t even get more than 10% on a Sw-bank free pollution, this is control of reality for me.”

Source link