Anthropic used Pokémon to Benchmark its new pattern

[ad_1]

Anthropic used Pokémon to Benchmark its new AI model. Yes that really.

In a blog post office The published, anthropic monday said he tried its last model, Claudius 3.7 sonneton the game of the classic pokémon game. The firm detailed with the model with basic memory, the pixel screen input and navigate around the screen, allowing pokémon continuously.

A unique feature of claude 3.7 sonnet is their ability to engage in “extended thought.” As the o3-mini of the Openai

It’s up to hand in Pokémon Red, apparently.

Compared to a previous version of Claude, Claude 3.0 Sonnet, who failed, who the city began, Claude has appropriate to three gym pokémon

A pokemon red anthropic
Image credits:Antropica

Now it’s not clear about how computer has been necessary for Claude 3.7 Sonnet to get to these fill – and how long. Antropica said said the model has made 35,000 actions to get to the last boss of the gym, Sweurge.

Of course, it won’t be long before any business developer.

Pokémon Red is more than a toy benchmark than something. However, there is it a Long story of the games that are used for banches of banchmacking AI. In the past few months only, a number of new apps and platforms are collected to the game of the models in the titles ranging from Licagrovars at Pick up. I am

[ad_2]

Source link