DeepSeek claims its reasoning model beats OpenAI's o1 in some benchmarks

[ad_1]

Chinese AI lab DeepSeek has released an open source version of DeepSeek-R1, its so-called reasoning model, which it says performs like OpenAI. o1 on some AI benchmarks.

R1 is available from the Hugging Face AI development platform under an MIT license, meaning it can be used commercially without restrictions. According to DeepSeek, R1 beats o1 on AIME, MATH-500, and SWE-bench Verified benchmarks. AIME uses other models to evaluate the performance of a model, while MATH-500 is a collection of word problems. SWE-bench Verified, meanwhile, focuses on programming tasks.

Being a model of reasoning, R1 effectively fact-checks itself, which it helps him avoid some of the pitfalls that normally beset models. Reasoning models take a little longer – usually seconds to minutes longer – to reach solutions compared to a typical non-reasoning model. The advantage is that they tend to be more reliable in areas such as physics, science and mathematics.

R1 contains 671 billion parameters, DeepSeek revealed in a technical report. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.

671 billion parameters is massive, but DeepSeek has also released “distilled” versions of R1 that range in size from 1.5 billion parameters to 70 billion parameters. The smallest can run on a laptop. As for the full R1, it requires more robust hardware, but this one is it available through DeepSeek’s API at prices 90%-95% cheaper than OpenAI’s o1.

There is a drawback to R1. Being a Chinese model, it is subject to benchmarking by China’s Internet regulator to ensure its responses “embody core socialist values.” R1 does not answer questions about Tiananmen Square, for example, or Taiwan’s autonomy.

DeepSeek R1 rejection — R1’s filter in action. **Image credits:**DeepSeek

Many Chinese AI systemsincluding other patterns of reasoning, decadence to respond to subjects that could raise the ire of regulators in the country, such as speculation on the Xi Jinping regime

R1 arrives days after the outgoing Biden administration proposed harder export rules and restrictions on AI technologies for Chinese companies. Companies in China were already barred from buying advanced AI chips, but if the new rules go into effect as written, companies will face tighter caps on both semiconductor technology and the models needed for sophisticated bootstraps. AI systems.

In a policy document last week, OpenAI urged the US government to support US AI development, lest Chinese models match or surpass it in capabilities. In one interview with The Information, OpenAI’s vice president of policy, Chris Lehane, singled out High Flyer Capital Management, DeepSeek’s corporate parent, as an organization of particular concern.

So far, at least three Chinese laboratories – DeepSeek, Alibaba, and As well aswhich is owned by Chinese unicorn Moonshot AI – have produced models that claim to rival o1. (Of note, DeepSeek was the first – it announced a preview of R1 at the end of November.) In a place in X, Dean Ball, an AI researcher at George Mason University, said the trend suggests that Chinese AI labs will continue to be “fast followers.”

“The impressive performance of DeepSeek’s distilled models (…) means that highly capable reasons will continue to proliferate widely and be executable on local hardware,” Ball wrote, “far from the eyes of any top control regime – down”.

[ad_2]

Source link

Related Posts

How well do you clean a kid. Car seat (2025)

Decrease distractions set your iPhone to the gray scale when you are at home

The distillation can make you smaller and cheaper models