DeepSeek released its V4 preview on Friday, April 24, 2026, with two open-weight variants tuned to run on Huawei Ascend chips instead of Nvidia silicon. The same day, a US State Department cable went to embassies worldwide warning of Chinese distillation of American AI models, naming DeepSeek, Moonshot AI and MiniMax. The pairing crystallized a hardware split that has been building since Washington banned H20 sales to China a year ago.
The Hangzhou lab posted the weights to its DeepSeek V4 Pro repository on Hugging Face under the MIT license. Hours later, Huawei went live to confirm its Atlas supernodes already run the model end to end. Cambricon Technologies posted Day 0 adaptation code to GitHub. Nvidia was not in the announcement deck.
Why Beijing’s AI Champion Skipped Nvidia This Round
The delay everyone blamed on engineering trouble was a hardware switch. DeepSeek spent the first quarter of 2026 rewriting its training and inference kernels for Huawei’s Compute Architecture for Neural Networks, the company’s CUDA equivalent, after founder Liang Wenfeng pushed the team to drop foreign accelerators. State media account Yuyuantantian called the result a move “beyond basic compatibility toward hardware-specific tuning,” language that downplays how unusual it is for a frontier lab to retarget its software stack mid-cycle.

Inside DeepSeek V4 Pro and Flash
The preview ships in two flavors with starkly different operating profiles. V4 Pro carries 1.6 trillion total parameters with 49 billion active per token, while V4 Flash runs 284 billion total with 13 billion active. Both support a one million token context window, up from 128,000 in last year’s V3.
| Model | Total Params | Active Params | Cache-Hit Input | Output |
|---|---|---|---|---|
| V4 Pro | 1.6 trillion | 49 billion | 1 yuan / M tokens | 24 yuan / M tokens |
| V4 Flash | 284 billion | 13 billion | 0.2 yuan / M tokens | 2 yuan / M tokens |
The Million Token Window Nobody Was Asking For
The eight-fold context jump matters less for chat than for code, legal discovery and long-form research. DeepSeek’s technical report claims V4 Pro scores 88.4% on MMLU and 92.1% on the new Humanities-X reasoning benchmark, figures the company says match or narrowly beat what GPT-5 and Anthropic’s Claude 4 Opus posted earlier this quarter.
The Cambricon Detail Buried in the Launch
The bigger signal sits in the part of the press cycle most outlets skipped. Cambricon’s Day 0 adaptation means a second Chinese silicon vendor, not just Huawei, can serve V4 inference from launch day. That redundancy is what a real domestic supply chain looks like, not a single-vendor dependency dressed up as one.
The Friday Cable That Followed the Launch
Within hours of the V4 livestream, the State Department issued a global cable instructing diplomatic posts to raise “concerns over adversaries’ extraction and distillation of U.S. A.I. models.” Reuters first reported the contents. The cable, obtained by reporters and dated Friday, named DeepSeek, Moonshot AI and MiniMax as firms of concern.
The document warns of foreign labs releasing models that “appear to perform comparably on select benchmarks at a fraction of the cost but do not replicate the full performance of the original system.” It then accuses the same campaigns of stripping safety rails: removing mechanisms that keep models “ideologically neutral and truth-seeking.”
The cable’s stated purpose, in its own words, is to “lay the groundwork for potential follow-up and outreach by the U.S. government,” diplomatic shorthand for sanctions, entity listings, or both. Foreign ministries in Tokyo, Seoul, Berlin and New Delhi received the cable over the weekend.
OpenAI’s hand is visible in the policy push. The lab told the US House Select Committee on China in February 2025 that DeepSeek had used distillation as part of an effort to “free-ride on the capabilities developed by OpenAI and other U.S. frontier labs.” Anthropic joined that line in February 2026.
The Chinese embassy in Washington called the cable’s claims “groundless” and accused Washington of using AI policy to slow Chinese technological progress. DeepSeek itself has not formally responded.
How Distillation Actually Works in a Lab
Distillation is a textbook AI technique, not a back-door hack. A smaller “student” model learns to copy the input-output behavior of a larger “teacher” by training on the teacher’s responses, often through a public API. Anthropic conceded in its February statement that AI labs “routinely distill their own models to create smaller, cheaper versions.”
The legal fight is about consent and terms of service, not the math. Sam Altman, in a January 31, 2025 Reddit comment after the original DeepSeek shock, wrote that OpenAI had “been on the wrong side of history” on open weights and needed a different strategy. That admission has not aged out of the debate.
The Hardware Bottleneck Behind the Delay
The reason V4 slipped from a March release to late April is silicon, not code. Huawei’s Ascend 910C chips are fabricated by SMIC at a 7nm-equivalent node, and monthly output is rationed even for prioritized customers. Training a 1.6 trillion parameter mixture-of-experts model demands tens of thousands of accelerators running for months without interruption.
Software was the second drag. CANN still has documented gaps in collective communication primitives that Nvidia’s NCCL handles natively, and DeepSeek engineers spent Q1 2026 patching kernel-level issues. The team’s reward for that effort is a model that boots on domestic hardware.
The next chip generation is what the announcement was really pointing toward. Huawei’s Ascend 950 roadmap calls for the 950PR to ship in the first quarter of 2026 at 1.56 PFLOPS in FP4 with 112 GB of in-house HBM, and the 950DT later this year with 144 GB of memory and 4 TB/s bandwidth.
The Atlas 950 SuperPoD, detailed at Huawei Connect 2025 by rotating chair Eric Xu, scales to 8,192 Ascend 950DT chips, delivering 8 EFLOPS in FP8, 16 EFLOPS in FP4, and 16 PB/s of optical interconnect bandwidth across 160 cabinets in a 1,000 square meter footprint. It is scheduled for the fourth quarter of 2026.
“The release of V4 explicitly mentions compatibility with domestic chips, and we expect significant improvement in the capabilities of domestic graphics cards and their widespread adoption this year.” – Huatai Securities research note, April 25, 2026
That timing is the real story. DeepSeek priced V4 Pro at 24 yuan per million output tokens, roughly $3.30, against Anthropic’s $25 for Claude. The Huatai note projects another sharp drop once 950 supernodes hit volume in the second half. Domestic supply, domestic prices, domestic users.
What Two AI Stacks Mean for Nvidia, AMD and TSMC
Nvidia took a $4.5 billion charge in its first fiscal quarter of 2026 tied to the H20 ban and warned of an $8 billion revenue hit in the second. AMD said the same restrictions would cost it about $1.5 billion in calendar 2026. Both companies struck an unusual deal with the Trump administration to resume some sales to China in exchange for a 15% revenue share to the US Treasury.
The DeepSeek V4 launch reframes that bargain. If a frontier-class Chinese model now runs production workloads on Ascend silicon at one-tenth the API price of US rivals, the addressable market for downgraded export-compliant Nvidia parts shrinks. TSMC, which still fabricates the most advanced Nvidia and AMD parts, sits one regulatory step away from being asked to choose sides on a more aggressive Entity List.
Two stacks, two prices, two safety regimes. That is the architecture the cable is trying to slow and the launch is trying to accelerate.
Frequently Asked Questions
What is DeepSeek V4 and when was it released?
DeepSeek V4 is a Chinese open-weight large language model released in preview on Friday, April 24, 2026, by the Hangzhou-based lab founded by Liang Wenfeng. It comes in two sizes, V4 Pro at 1.6 trillion parameters and V4 Flash at 284 billion, both with a one million token context window and posted under the MIT license.
Why is DeepSeek V4 a problem for Nvidia?
V4 is the first frontier-class Chinese model with explicit Day 0 support for Huawei Ascend chips and Cambricon accelerators rather than Nvidia GPUs. It signals that Chinese labs can now train and serve a top-tier model on domestic silicon, shrinking the market for export-compliant Nvidia parts and tightening the case for further US chip controls.
What did the US State Department cable actually say?
The cable, dated April 24, 2026, instructs diplomats to raise concerns about Chinese firms extracting and distilling US AI models, naming DeepSeek, Moonshot AI and MiniMax. It warns that the resulting models match benchmarks at a fraction of the cost but strip safety guardrails, and asks foreign governments to weigh those risks before deploying Chinese models on sensitive systems.
Is AI distillation illegal?
Distillation itself is a standard machine-learning technique used inside every major AI lab. The legal question is consent: training a competitor’s model on outputs scraped through a paid API can violate terms of service and, in some jurisdictions, copyright or computer-misuse statutes. No US court has ruled on cross-border distillation yet.
How much cheaper is DeepSeek V4 than Claude or GPT-5?
DeepSeek charges 24 yuan, roughly $3.30, per million output tokens for V4 Pro and 2 yuan for V4 Flash. Anthropic’s Claude 4 Opus is priced at $25 per million output tokens, and OpenAI’s GPT-5 sits in a comparable range. That makes V4 Pro about seven times cheaper on output, before accounting for cache-hit input discounts.
When will the Huawei Ascend 950 chips ship in volume?
Huawei’s roadmap calls for the Ascend 950PR in the first quarter of 2026 and the 950DT in the fourth quarter. The Atlas 950 SuperPoD, scaling to 8,192 chips and 16 EFLOPS in FP4, is also slated for the fourth quarter of 2026. Mass-market V4 pricing on Chinese cloud platforms is expected to drop materially once those clusters reach volume.
By Monday morning in Shenzhen, V4 Pro had already racked up more than three hundred thousand downloads on Hugging Face mirrors, and the Hangzhou lab had stopped answering press queries about Nvidia. The next chapter will not be written in benchmarks. It will be written in whichever embassy in Tokyo, Berlin or Brasilia decides first whether the cable from Foggy Bottom is a warning to act on or a memo to file.



Leave a Comment