Monday, January 27, 2025

DeepSeek - new AI model from China

  • Silicon Valley Is Raving About a Made-in-China AI Model: As of Saturday, DeepSeek models R1 and V3 were ranked in the top 10 on Chatbot Arena, a platform hosted by University of California, Berkeley, researchers that rates chatbot performance. A Google Gemini model was in the top spot, while DeepSeek bested Anthropic’s Claude and Grok from Elon Musk’s xAI.

    DeepSeek said training one of its latest models cost $5.6 million, compared with the $100 million to $1 billion range cited last year by Anthropic.


  • How China’s new AI model DeepSeek is threatening U.S. dominance: “To see the DeepSeek new model, it’s super impressive in terms of both how they have really effectively done an open-source model that does this inference-time compute, and is super-compute efficient,” Microsoft CEO Satya Nadella said at the World Economic Forum in Davos. “We should take the developments out of China very, very seriously.”

    DeepSeek also had to navigate the strict semiconductor restrictions that the U.S. govt has imposed on China, cutting the country off from access to the most powerful chips, like Nvidia’s H100s. The latest advancements suggest DeepSeek either found a way to work around the rules, or that the export controls were not the chokehold Washington intended.

    “They can take a really good, big model and use a process called distillation,” said Benchmark General Partner Chetan Puttagunta. “Basically you use a very large model to help your small model get smart at the thing you want it to get smart at. That’s actually very cost-efficient.”


  • DeepSeek R1 Explained to your grandma:



  • How small Chinese AI start-up DeepSeek shocked Silicon Valley: DeepSeek’s R1 release sparked a frenzied debate in Silicon Valley about whether better resourced US AI companies, including Meta and Anthropic, can defend their technical edge.

    Industry insiders say DeepSeek’s singular focus on research makes it a dangerous competitor because it is willing to share its breakthroughs rather than protect them for commercial gains. DeepSeek has not raised money from outside funds or made significant moves to monetise its models.

    DeepSeek claimed it used just 2,048 Nvidia H800s and $5.6mn to train a model with 671bn parameters, a fraction of what OpenAI and Google spent to train comparably sized models.

    Ritwik Gupta, AI policy researcher at the University of California, Berkeley, said DeepSeek’s recent model releases demonstrate that “there is no moat when it comes to AI capabilities”. “The first person to train models has to expend lots of resources to get there,” he said. “But the second mover can get there cheaper and more quickly.”

    Gupta added that China had a much larger talent pool of systems engineers than the US who understand how to get the best use of computing resources to train and run models more cheaply.


  • Twist in the tale: Nvidia Stock May Fall As DeepSeek’s ‘Amazing’ AI Model Disrupts OpenAI.. America’s policy of restricting Chinese access to Nvidia’s most advanced AI chips has unintentionally helped a Chinese AI developer leapfrog U.S. rivals who have full access to the company’s latest chips. This proves a basic reason why startups are often more successful than large companies: Scarcity spawns innovation.


  • The Empire Strikes Back: China Prepares One Trillion Yuan AI Plan to Rival $500 Billion US Stargate Project.


No comments: