Meet DeepSeek: the Chinese start-up that is changing how AI models are trained
Chinese start-up DeepSeek has emerged as "the biggest dark horse" in the open-source large language model (LLM) arena in 2025, just days after the firm made waves in the global artificial intelligence (AI) community with its latest release.
That assessment came from Jim Fan, a senior research scientist at Nvidia and lead of its AI Agents Initiative, in a New Year's Day post on social-media platform X, following the Hangzhou-based start-up's release last week of its namesake LLM, DeepSeek V3.
"[The new AI model] shows that resource constraints force you to reinvent yourself in spectacular ways," Fan wrote, referring to how DeepSeek developed the product at a fraction of the capital outlay that other tech companies invest in building LLMs.
Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.
DeepSeek V3 comes with 671 billion parameters and was trained in around two months at a cost of US$5.58 million, using significantly fewer computing resources than models developed by bigger tech firms such as Facebook parent Meta Platforms and ChatGPT creator OpenAI.
LLM refers to the technology underpinning generative AI services such as ChatGPT. In AI, a high number of parameters is pivotal in enabling an LLM to adapt to more complex data patterns and make precise predictions. Open source gives public access to a software program's source code, allowing third-party developers to modify or share its design, fix broken links or scale up its capabilities.
Jim Fan, a senior research scientist at semiconductor design giant Nvidia, says he has been closely following developments at artificial intelligence start-up DeepSeek. Photo: SCMP alt=Jim Fan, a senior research scientist at semiconductor design giant Nvidia, says he has been closely following developments at artificial intelligence start-up DeepSeek. Photo: SCMP>
DeepSeek's development of a powerful LLM at less cost than what bigger companies spend shows how far Chinese AI firms have progressed, despite US sanctions that have largely blocked their access to advanced semiconductors used for training models.
Leveraging new architecture designed to achieve cost-effective training, DeepSeek required just 2.78 million GPU hours - the total amount of time that a graphics processing unit is used to train an LLM - for its V3 model. DeepSeek's training process used Nvidia's China-tailored H800 GPUs, according to the start-up's technical report posted on December 26, when V3 was released.
That process was substantially less than the 30.8 million GPU hours that Meta needed to train its Llama 3.1 model on Nvidia's more advanced H100 chips, which are not allowed to be exported to China
"DeepSeek V3 looks to be a stronger model at only 2.8 million GPU hours," computer scientist Andrej Karpathy - a founding team member at OpenAI - said in his X post on December 27.
Karpathy's observation prompted Fan to respond on the same day in a post on X: "Resource constraints are a beautiful thing. Survival instinct in a cutthroat AI competitive land is a prime driver for breakthroughs."
"I've been following DeepSeek for a long time. They had one of the best open coding models last year," Fan wrote. "Superior OSS [open-source software] models put huge pressure on commercial, frontier LLM companies to move faster."
Hangzhou-based DeepSeek was spun off from hedge-fund manager High-Flyer Quant. Photo: Shutterstock alt=Hangzhou-based DeepSeek was spun off from hedge-fund manager High-Flyer Quant. Photo: Shutterstock>
The founder of cloud computing start-up Lepton AI, Jia Yangqing, echoed Fan's perspective in an X post on December 27. "It is simple intelligence and pragmatism at work: given a limit of computation and manpower present, produce the best outcome with smart research," wrote Jia, who previously served as a vice-president at Alibaba Group Holding, owner of the South China Morning Post.
DeepSeek did not immediately respond to a request for comment.
The start-up was reportedly spun off in 2023 by hedge-fund manager High Flyer Quant. The person behind DeepSeek is High-Flyer Quant founder Liang Wenfeng, who had studied AI at Zhejiang University.
In an interview with Chinese online media outlet 36Kr in May 2023, Liang said High-Flyer Quant had already bought more than 10,000 GPUs before the US government imposed AI chip restrictions on China. That investment laid the foundation for DeepSeek to operate as an LLM developer. Liang said DeepSeek also receives funding support from High-Flyer Quant.
Most developers at DeepSeek are either fresh graduates, or people early in their AI career, following the company's preference for ability more than experience in recruiting new employees.
A screenshot of a response by DeepSeek's V3 model, which mistakenly identified itself as OpenAI's ChatGPT. Photo: X alt=A screenshot of a response by DeepSeek's V3 model, which mistakenly identified itself as OpenAI's ChatGPT. Photo: X>
DeepSeek's V3 model, however, has also stirred some controversy because it had mistakenly identified itself as OpenAI's ChatGPT on certain occasions.
Lucas Beyer, a researcher at Microsoft-backed OpenAI, said in an X post last Friday that DeepSeek V3's misidentification was prompted by this simple question: "What model are you?"
Still, V3 is not the first AI model struck by identity confusion. Machine-learning expert Aakash Kumar Nain wrote in a post on X that it was common a mistake made across various AI models because "a lot of data available on the internet has already been GPT-contaminated".
A group of researchers from China's Shandong University and Drexel University and Northeastern University in the US echoed Nain's view. Out of 27 AI models these researchers tested, they found that a quarter exhibited identity confusion, which "primarily stems from hallucinations rather than reuse or replication".
As of Tuesday, DeepSeek's V1 LLM was still ranked as the most popular AI model on Hugging Face, the world's largest online machine-learning and open-source AI community.
This article originally appeared in the South China Morning Post (SCMP), the most authoritative voice reporting on China and Asia for more than a century. For more SCMP stories, please explore the SCMP app or visit the SCMP's Facebook and Twitter pages. Copyright © 2025 South China Morning Post Publishers Ltd. All rights reserved.
Copyright (c) 2025. South China Morning Post Publishers Ltd. All rights reserved.