How Chinese AI startup DeepSeek made a model that rivals OpenAI

[ad_1]

Today, DeepSeek is one of the only leading AI companies in China that does not rely on funding from tech giants such as Baidu, Alibaba, or ByteDance.

A Young Group of Geniuses Eager to Try

According to Liang, when he assembled DeepSeek’s research team, he wasn’t looking for experienced engineers to build a consumer product. Instead, he focused on PhD students from China’s top universities, including Peking University and Tsinghua University, who were eager to try. Many have published in leading journals and won awards at international academic conferences, but lacked industry experience, according to the Chinese technology publication QBitAI.

“Our core technical positions are mostly filled by people who graduated this year or in the last one or two years,” Liang said to 36Kr in 2023. The hiring strategy helped create a collaborative company culture where people were free to use extensive IT resources to pursue unorthodox research projects. It’s a very different way of operating than established internet companies in China, where teams are often competing for resources. (A recent example: ByteDance accused a former intern– a prestigious academic award winner, no less – to sabotage the work of his colleagues in order to accumulate more computing resources for his team.)

Liang said students may be better suited for high-investment, low-profit research. “Most people, when they are young, can devote themselves completely to a mission without utilitarian considerations,” he explained. His pitch to prospective hires is that DeepSeek was created to “solve the world’s toughest questions.”

The fact that these young researchers are almost entirely educated in China adds to their momentum, experts say. “This younger generation also embodies a sense of patriotism, particularly as they navigate U.S. restrictions and bottlenecks in critical hardware and software technologies,” Zhang explains. “His determination to overcome these barriers reflects not only personal ambition, but also a broader commitment to advancing China’s position as a global innovation leader.”

Innovation is born from a crisis

In October 2022, the US government began putting together export controls that severely limited Chinese AI companies from accessing cutting-edge chips like Nvidia’s H100. The move presented a problem for DeepSeek. The company had started with a reserve of 10,000 H100, but needed more to compete with companies like OpenAI and Meta. “The problem we faced was never financing, but export control on advanced chips,” Liang told 36Kr. in a second interview in 2024.

DeepSeek had to come up with more efficient methods to train their models. “They optimized their model architecture using a battery of engineering tricks – custom communication schemes between chips, reducing the size of fields to save memory, and the innovative use of the mix-of-models approach,” he says Wendy Chang, a software engineer turned politician. analyst at the Mercator Institute for China Studies. “Many of these approaches are not new ideas, but combining them successfully to produce a cutting-edge model is a remarkable feat.”

DeepSeek has also made significant progress in Multi-head Latent Attention (MLA) and Mixture-of-Experts, two technical designs that make DeepSeek models more cost-effective, requiring less computing resources to train. In fact, DeepSeek’s latest model is so efficient that it required one-tenth the computing power of Meta’s comparable Llama 3.1 model to train, according to research institution Epoch AI.

DeepSeek’s willingness to share these innovations with the public has earned them goodwill in the global AI research community. For many Chinese AI companies, the development of open source models is the only way to play catch-up with their Western counterparts, because it attracts more users and contributors, which in turn help the models grow. “They have now shown that state-of-the-art models can be built with less, but still a lot, of money and that the current standards of model building leave a lot of room for optimization,” says Chang. “We’re sure to see many more attempts in this direction going forward.”

The news could spell trouble for current US export controls that focus on creating IT resource bottlenecks. “Existing estimates of how much AI computing power China has, and what they can achieve with it, could be blown away,” Chang says.

[ad_2]

Source link