Part 2: What DeepSeek V4 means for Huawei and Nvidia
the beginning of the tech stack shift?
And we continue…
In Part 1, we argued that DeepSeek V4 is best understood as a form of shared R&D for China’s open-source AI ecosystem. The Huawei adaptation is one of the most strategically important examples of that. Making V4 run on Huawei’s stack is not just about one model. It is about creating a reference workload for China’s domestic AI hardware and software stack.
The easy headline is that DeepSeek V4 runs on Huawei chips, so China is reducing reliance on Nvidia. Directionally true, but that could be too simplistic. V4 in no means proves China has solved AI compute, and it does not prove Huawei can replace Nvidia for frontier training.
And even further, it does not mean CANN has matched CUDA. And it does not mean Nvidia’s global training franchise is suddenly impaired. However, it does give Huawei something it badly needed: a serious workload around which the domestic stack can organize and a group of developers they can work together to make CANN better.
That is why this matters. A chip ecosystem becomes real when there are workloads, developers, kernels, compilers, cloud deployments, enterprise customers, benchmarks, support contracts, and a reason for people to optimize around it.
TLDR, my main points:
1/ V4 gives Huawei and CANN their clearest frontier-class reference workload so far.
2/ The implication is strongest on inference, not training. We should not overclaim that China can now train frontier models end-to-end on domestic silicon.
3/ The three-month release delay suggests DeepSeek optimized for ecosystem strategy, not just launch speed.
4/ Jensen Huang’s export-control argument looks more rational after V4. Export controls may restrict Nvidia’s business in China, but they also create a stronger incentive for China to build a parallel stack and really push for self-reliance.
5/ The Nvidia bear case should not be overstated. Losing China's inference share is not the same as losing global training leadership.
Huawei benefits from the open-source story
The Huawei story is not separate from the open-source story. It is the open-source story moving into hardware.
DeepSeek V4 was adapted for Huawei’s Ascend AI chips, for which DeepSeek granted early access to domestic companies such as Huawei, rather than sharing the model with US chipmakers for performance tuning, and Huawei said V4 is fully supported on its Ascend 950-based supernode clusters. Huawei also said its chips were used for part of V4-Flash training, while DeepSeek did not disclose whether V4 itself was trained on Nvidia or Huawei chips.
The conservative consensus reads that Huawei has not replaced Nvidia for frontier training but moved credible inference workload around onto Huawei, which the domestic stack can improve.
Serving a model and training a model are different problems. Training is where Nvidia’s highest-end systems, interconnect, software stack, and cluster-level reliability matter most. That is still the hardest part to replace, and it doesn’t look like China’s domestic options are near that. Inference is where the model gets deployed into real products, enterprise workflows, consumer apps, and cloud APIs. It is where latency, throughput, utilization, memory, cost-per-task, and reliability matter every day.
So if Huawei Ascend can serve leading Chinese open models at acceptable latency, throughput, and cost, then Chinese hyperscalers, SOEs, and private companies have a real domestic deployment path. What this could mean is that the Chinese inference market can become less Nvidia-dependent. And that itself is already a big deal.
The strategic implication is not that Huawei has caught Nvidia across the board. The strategic implication is that Huawei may now have a realistic path to becoming the domestic Chinese inference standard. Narrower claim, but still very important.
V4’s design is part of the hardware strategy
The hardware case for Ascend is not only about political will. It is also tied to model design. If China cannot get enough frontier Nvidia chips, there are two ways to respond. One is to build better domestic chips. The other is to design models that are easier to run on the chips China can get.
This is why inference efficiency matters so much for Huawei and DeepSeek. If DeepSeek can make frontier-ish models lighter to serve, Huawei does not need to match Nvidia perfectly chip-for-chip in order to become viable for a large part of domestic inference. It needs to be good enough for the workloads Chinese companies actually run. Which could, in theory, be a more realistic path forward.
This is also where the open-source layer matters. If DeepSeek figures out how to make the model more hardware-friendly, that work can become a reference point for everyone else. Other labs can study the architecture. Huawei can optimize around the workload. Chinese cloud providers can tune deployment. The model layer starts reshaping the hardware layer. The five-layer cake is moving onshore one by one.
This brings us back to the Dwarkesh Patel interview. Jensen’s argument was that the US should care not only about leading at the model layer, but also about keeping (Chinese) AI developers on the American technology stack. He warned that forcing Nvidia out of China could create two ecosystems and push Chinese AI developers toward internal architectures (and lose their leverage as an American firm)
You can believe export controls are necessary for national-security reasons and still accept that they create incentives for China to build a domestic stack. Those are not contradictory views.
V4 is basically the kind of event Jensen has been warning about (he probably knew). Not because Huawei has caught Nvidia. It has not. Not because CANN has matched CUDA. It has not. But because the learning loop is starting.
DeepSeek optimizes for Ascend. Huawei gets a serious workload. CANN gets more developer attention. Cloud providers get more deployment experience. Enterprises get more comfort. The next model can become more Ascend-friendly than the last one. That is how a parallel stack begins.
The point is not that it works perfectly today. The point is that the incentive structure is now very clear. If Nvidia is not available, Chinese labs will optimize the model around the chips they can use. Let’s say now, DeepSeek open-sources all that work, the whole system in theory, and then move faster. And this isn’t some like new epiphany, this was a logic that was shared with me by one of the labs earlier in the year as well.
What V4 means for Nvidia
As we’ve written in Part 1, the market read is that DeepSeek essentially took one for the team and delayed the V4 release by almost three months to re-engineer inference so it could run properly on Huawei’s stack before launch.
DeepSeek absorbing that Huawei adaptation burden is meaningful. It took on the wait and the cost. Running on Nvidia is just easier. The tooling is better, the developer familiarity is better, and the software ecosystem is more mature. Most other Chinese labs do not really have the luxury to spend that kind of time on Huawei adaptation when they need to launch, monetize, raise money, and keep customers engaged.
Now, it’s important to note that CUDA’s moat was not built overnight. It took more than a decade of developer adoption, libraries, tooling, optimization, documentation, community familiarity and customer deployment. CUDA became the default because developers built on it, optimized around it, debugged on it, and trusted it. That kind of ecosystem does not get replaced just because a domestic alternative exists.
CANN is not CUDA, and no one is pretending it is even close. Developers still prefer Nvidia when they can get it because CUDA is easier. But every alternative stack needs a first serious workload before people treat it as a proper alternative.
Before V4, the case for CANN was mostly strategic. The government wanted it. Huawei wanted it. Chinese customers knew they needed an alternative. But developers do not optimize for strategy in the abstract. They optimize around real workloads.
V4 changes the practical conversation. Now, Huawei can say: here is a leading Chinese open model, adapted to our stack, with DeepSeek behind it, and with real customer demand forming around it. So what V4 gives CANN is a proper workload. It gives cloud providers a reason to optimize. It gives enterprise customers a reason to test. It gives other Chinese labs a reference point. And once that loop starts, the stack can improve. Developers find bottlenecks, kernels improve, compilers improve, cloud deployment improves, and model architecture becomes more Ascend-friendly. The next model becomes easier to run than the last one.
This is maybe what Jensen Huang has been talking about with a capital E, ecosystem moat. Ecosystems can be built, albeit not overnight, but through repeated workload-driven optimization.
This is why the Huawei support matters, but the claim should stay narrow. The strongest evidence is around V4-compatible inference and partial V4-Flash training support, and def not proof that Huawei has replaced Nvidia for frontier training.
The conservative read is that the clearest evidence is around inference and partial V4-Flash training support. DeepSeek has not disclosed whether the core V4 / V4-Pro training run used Nvidia or Huawei chips. So I would keep the Huawei implication focused on deployment and inference, not frontier training replacement.
But inference alone is enough to start an ecosystem loop. Once V4 runs on Ascend, Chinese cloud providers have a reason to test it. Enterprises have a reason to deploy it. Huawei has a reason to improve kernels, compilers, and tooling around a real model. DeepSeek and other labs have a reference point for making the next model more Ascend-friendly.
This is also where Jensen’s argument looks more rational. His point has always been that the US should care not only about who has the best chip, but also about which technology stack AI developers build on. If export controls push Chinese labs away from Nvidia, then over time, they will optimize around the chips they can actually use.
The important business implication for Nvidia is therefore not that V4 is a technical death blow, but that it is an ecosystem warning. Nvidia can remain technically ahead while China increasingly routes more deployment demand toward domestic alternatives. If Chinese models become good enough, open enough, and optimized enough for domestic chips, then the question in China slowly shifts from “is Ascend better than Nvidia?” to “is Ascend good enough for this workload?” And that could be a much lower bar.
Good enough under export controls. Good enough under domestic procurement pressure. Good enough for Chinese cloud companies and SOEs. Good enough once models are designed around it. Good enough once the software improves through real usage.
This is the part Nvidia should care about. CUDA’s moat is still very real globally, but CANN finally has a serious workload to improve against. And once a domestic stack has workload, demand, and policy support at the same time, the gap can narrow faster than people expect.
V4 is not a technical death blow, but it is a market-access and ecosystem warning for Nvidia. If China keeps building models that are good enough, open enough, and optimized enough for domestic chips, then Nvidia can remain technically ahead while still losing more of the Chinese deployment market.
The US open-source debate
There has been one dominant US narrative around DeepSeek’s open releases: if US frontier labs do not build open models, US enterprises may end up running Chinese ones. That framing is being used to push US labs toward open-sourcing as a defensive, geopolitically motivated move.
I do not think the logic fully holds. If the “China threat” did not exist, would US enterprises not still want frontier-scale open models? Of course they would.
The real driver of enterprise open-source adoption is not geopolitics. It is economics. Enterprises want performance, cost control, customization, privacy, deployment flexibility, reliability, and less vendor lock-in. Geopolitics can shape procurement at the margin, especially in sensitive sectors, but the baseline reason enterprises want open models is economic and operational.
China did not invent that demand. The argument against open-sourcing is geopolitically charged. The argument for open-sourcing is also becoming geopolitically charged. But open source should not be reduced to a geopolitical weapon. Its core value has always been that it spreads capability, reduces duplication, and accelerates innovation.
If you think about it, last year, V3 shocked the world by showing how much China could do under constraints. V4 may be more low-key important because it shows how the whole Chinese AI ecosystem can reorganize around that constraint and almost unite under it all. Faced with lower model costs, there is an urge to be more collaborative, lean into shared R&D, pursue greater specialization across labs, put more pressure on standalone model companies, and push harder toward domestic inference infrastructure.




100% concur. But there is a missing link - a gaping hole! - in China's hardware. And without it, the Edge cannot be properly 'tamed' by Chinese AI. China needs a Mac Mini — a consumer‑scale, unified‑memory, $799 desktop that runs DeepSeek V4‑Flash (and other Chinese LLMs though - after Qwen - DS is closest to being 'Mini Mac' ready) offline, with one click. Until that exists, the Hardware Edge will belong to Apple. Of the Chinese LLMs, only Qwen3.6-35B-A3B is compatible with Mac Mini (along with Nemotron 3 and Gemma 4.) Of course - by definition - no closed weight models are compatible with the Mac Mini.
Hi Grace, thank you for laying out the contrasting mindset between the U.S. Frontier strategy vs the DeepSeek V4 Inference focusing on the economic dimension of the Chinese AI ecosystem. In the long run, the Chinese will reap harvest with a much shorter payback period, and use such gain to re-invest more into her home-grown Frontier model and propel the ongoing inference strength into a credible and defensible MOAT.Hi Grace, thank you for laying out the contrasting mindset between the U.S. Frontier strategy vs the DeepSeek V4 Inference focusing on the economic dimension of the Chinese AI ecosystem. In the long run, the Chinese will reap the harvest with a much shorter payback period, and use such gains to re-invest more into their home-grown Frontier model and propel the ongoing inference strength into a credible and defensible MOAT.