Part II: How to understand China's LLMs as a business
subscription, distribution, the competition and the never big enough pie
Now let’s move on to the second half of this.
If Part 1 is about how Chinese labs can close the capability gap, Part 2 is about the more brutal question: what happens after you close it.
In theory, being “close to the frontier” should make monetization easier. In practice, China has a habit of compressing business models faster than it compresses technology. The same forces that make diffusion fast also make margins thin: competition that races to zero, big platforms that bundle and subsidize, and an ecosystem where distribution matters as much as the model itself.
One way to put it is that the LLM economy isn’t just a model economy. It’s a packaging economy. We’ve written a lot about distribution, ecosystem, and all that jazz, but what I realized is that it’s also about how the products are bundled. Who owns the user relationship? Who can serve demand without crashing? Who can provide reliability and workflow? Who can turn tokens into a product people subscribe to?
So in Part 2, we move down my ‘dumb question list’, where we look deeper at just how the labs are making money, especially as they’ve gone public and now need to prove to shareholders. The competition now moves to distribution, pricing power, compute constraints, and whether anyone can sell outcomes rather than compute.
The business of Chinese LLMs
One way of thinking that seemed to be on everyone’s mind is that the pie can always be bigger in China. Big tech companies keep trying to create a bigger pie and then eat the whole thing instead of recognizing stable vertical boundaries. This is fundamentally quite different from how big tech in the U.S. thinks. For example, in the U.S., Google and Meta kinda understand they’re not going to go eat Amazon’s dinner, and even when companies expand their business verticals, they find synergy, for example, Uber going into delivery.
But in China, ByteDance got big and wanted commerce; they went after Alibaba’s lunch. When Meituan got big, they also expanded into JD’s domain. You see what I mean? That instinct creates a contradiction. It’s why many startups retreat into verticals. It’s also why platform giants keep wanting to capture more users and create an ecosystem; however, that also means flattening the economics (like making services free).
The second point that came up repeatedly is that, long term, a subscription model makes the most sense for LLMs. It’s a Spotify-like model. Instead of paying for tokens, most people use only a small amount, just as they listen to only a small subset of songs. A flat fee aligns incentives and creates the potential for durable margins. So let’s look at the business side of things in more detail…
American SaaS didn’t succeed in China, and American AI won’t be able to
This is a strong claim, so I’ll define it more precisely. I don’t think American AI companies can treat China as a straightforward growth market in the mainstream sense, meaning mass consumer distribution and broad enterprise workflow capture. Not because the products are not good enough, but because the ecosystem pushes pricing toward zero and distribution toward local incumbents, on top of regulatory constraints that are obvious, and I won’t go into further detail.
From big tech product managers to tech investors, the consensus was = “中国太卷了” (China is simply too competitive). ByteDance and Alibaba will build Lark/ Feishu, DingTalk/ Dingding Slack-like products, and ship them for free. Tencent Meeting, a Zoom-level product, was also largely free until recently and remains very cheap. If quality is similar and price is zero, it’s not a fair fight, really, for newcomers or outsiders. Because local firms compete relentlessly on cost.
During the Internet era, SaaS businesses struggled to sell in China, in part because the market is brutally competitive. And even with the improved sophistication of the digital infrastructure, they still couldn’t penetrate China’s enterprise market. It actually didn’t come down to capabilities, funcationalities or infrastructure.
People call it 卷, and “involutionary” is closer than “competitive.” It is a race to zero, and only unusually capable companies can charge. And now we’re seeing that play out in the AI space again. Labs are forced to be free to grow domestically.
People told me you can use Doubao for free, and even access tools like Seedance and Seedream. In that environment, selling AI software in China is hard, and some people put it bluntly: basically no chance, and it might even get blocked. So how does a company like Meitu or Midjourney then compete in that space? Well, I guess they can aim to go for the premium layer. People will use Cursor or Midjourney because they are the best of the best, even if they pay out of pocket and through a VPN. But the line between prosumers and consumers is blurring, especially since Doubao lets users access Seedance directly in the app.
But that’s really not the main point, I guess the point is that Chinese AI business models may not look the same as the American ones, just as the software economics didn’t.
Z.ai vs. MiniMax
And because of that structure, it has produced two business playbooks for labs like Z.ai and MiniMax.
The first playbook is to go after a niche market that values stability, service, and predictability. Government deals are the core. Z.ai (Zhipu) has largely gone after this along Belt and Road context. It isn’t a massive market, and it likely won’t be courted by OpenAI or Google, but it’s big enough.
The second playbook is the hamster wheel, the frontier-of-frontier race, battling for dominance and enterprise reach. MiniMax previously said they were the leading multimodality model, and they seem to want to fight for that. The open question now is how they compete with Doubao, now that ByteDance is doubling down on AI.
Selling API sounds like the obvious monetization path until you hit the supply constraint. API sales are limiting because once demand rises, you need enough compute to support it. If you don’t have enough compute, inference becomes the ceiling on revenue. The real bottleneck isn’t training. It’s inference capacity.
One off-the-record explanation I heard for why ByteDance has not opened its API is that internal inference resources are already overloaded and cannot support external demand. In the same vein, I heard ByteDance’s revenue target for its large-model business this year is 10 billion RMB, so that means they’ll need to grab any chip they can get their hands on, which we’ll go into further detail below.
And then, it’s brand and distribution. They matter more than most people admit. In one set of conversations about model capability comparisons, I heard that Kimi 2.5’s release created real pressure for Zhipu. Zhipu expected Kimi to launch a separate multimodal model. Instead, Kimi integrated multimodal capability directly into the flagship model. At that time, I was told Zhipu did not have that multimodal capability. Their response was not purely technical. It was also positioning. They came up with Pony Alpha as a novel PR move. The anonymity actually then boosted brand recognition and gave a little bump to the adoption of GLM-5.
Zhipu’s positioning, as it was described to me, is now very focused on coding and agentic capability, and they believe they remain in the domestic first tier in that area. I also heard an internal comparison that even though MiniMax faced heavy marketing pressure after its 2.5 release, with KOL reposts on X and articles reaching five million views, people at Zhipu still believed MiniMax lagged them in agentic and coding. They claimed the gap is noticeable in real usage, not in charts.
When the conversation moved from capability to monetization, the segmentation got clearer. Domestically, I heard that Zhipu focuses on large key accounts, many of which are government projects.
I was told these orders require someone like Z.ai’s Chairman Tang Jie, or what everyone calls him as “Tang Laoshi’, as someone who’s nearly 50 years old, and was the teacher of many such as Yang Zhilin (Moonshot), he holds a je ne sais or you may say authority/ command., someone with a particular identity and standing, to take them on with more ownership. In that view, it’s hard to imagine teams like Kimi or MiniMax taking on those projects. This was described as a unique competitive advantage.
However, talent unity and leadership also play a huge role in it all. While Alibaba and ByteDance have a top-down corporate mandate to go all-in on AI. It’s still people like DeepSeek’s Liang Wenfeng, Z.ai’s Tang Jie, Moonshot’s Yang Zhilin, and MiniMax’s Yan Junjie that lead in the direction, unity, and ambitions of the R&D team. Sometimes you need a ‘spiritual leader’ to make people follow you in taking the unthinkable and unproven path.
On “going abroad,” I repeatedly heard sovereign AI as the framing, with the Middle East as the common example. The logic is straightforward: even countries that appear closely allied with the U.S. do not want to be 100% dependent on America for AI. People told me Zhipu believes roughly 20% to 30% of market share will be left open for Chinese vendors. Government AI, long term, was described as sticky once you get in and solve real problems, because government customers are not necessarily optimizing for the newest or cheapest. They care about stability and problem-solving. On the other end, I heard the developer motion described as PLG and B2B2C, with less focus on overseas large enterprise key accounts, possibly for regulatory or other strategic reasons. I would be interested to hear what Kevin Xu may think of this.
Distributors vs. vendors
One of the more useful business analogies I heard last week was about distribution. It made more sense why it could be in Google’s interest not to kill Lovart AI. Google sells Gemini to Lovart, and Lovart is the distributor and also the user. Lovart has reach, a niche, and scale, and it services a crowd that Google doesn’t want to serve itself. Serving that new clientele takes attention and resources away from frontier research, and multiple labs described that as a real dilemma.
Lovart came up again in a concrete example that also explains why distributors survive. Lovart appears to be a reseller of Nano Banana credits. But to users, it’s packaging the experience. It handles concurrency. It simplifies workflows. It makes tasks like watermark removal one-click. It handles marketing and distribution through Chinese KOLs, tasks the upstream vendor won’t handle.
The economics, as described to me, are simple. Lovart buys Nano Banana credits at a steep discount, around 20% to 30% of face value. It resells at roughly 70% on average. That spread becomes a business, especially at volume.
It’s like they’re the 4S store to the model labs. In China’s auto market, people call a standard authorized dealership a “4S store”. The four S’s refer to the dealer handling sales, spare parts, after-sales service, and customer feedback. It’s not just a showroom. It holds inventory risk, provides repairs and maintenance, runs local customer acquisition, and absorbs the messy support layer that automakers do not want to build themselves.
When Mercedes-Benz sells in China, it does not actually want to hire all the salespeople in every “mid-size city,” which can have a population of 10 million or more. So the point of the analogy is that dealerships exist because the manufacturer wants reach without becoming a nationwide service organization. Lovart plays a similar role for models and agent products.
This also connects to the “wrapper” tension that came up in my conversations. I heard an example using Manus. When Manus first launched, the base model behind it was GPT-4o (or something similar). Much of Manus’s value came from the agent architecture, the wrapper layer. But if the base model gets much stronger, say GPT-4.5 or 4.6, some of the wrapper’s value can get absorbed by the base model itself. The wrapper gets thinner. And yet, people still told me that Cursor, despite constant claims it’s “done,” has strong stickiness and renewals. Workflow and habit (and compliance clearance) are hard to dislodge.
This is also why some labs prefer to partner rather than build every product layer themselves. And Zhipu is partnering with Kingsoft (China’s “Office suite” provider), with the relationship described as similar to Microsoft Office and OpenAI.
One reason is operational: building a Cowork-like product eats into inference and creates a long-term maintenance burden. So then it makes the labs question themselves again, are they the infrastructure providers, the service providers, or do they want to be both?
For many who are more resource-constrained, they would rather focus on R&D and let the platform partner handle the product surface area, user management, and even the compute load.
Balancing supply and demand
The most universal constraint I heard last week is that inference demand exceeds supply, and the gap is driven by chips.
What we’re also seeing is situations like a company like Baidu or ByteDance might have ten to twenty times the chip capacity of a startup, but its demand is also far higher. So the proportional constraint is even greater.
So the operational response can look counterintuitive. Sometimes they reduce demand to avoid crashing. Many users are free. Demand exceeds supply, so companies throttle new-user growth to protect performance.
Scarcity forces efficiency. And that’s the first breakthrough we saw driven by DeepSeek’s ingenuity. People described inference-side optimizations that sound unusually mature because they have had to be.
Tidal compute is one example: run inference during the day, then shift compute toward training when demand drops at night. Queue-based concurrency is another: in products like OpenCloud, users do not always need immediate completion and can wait one or two hours.
Chip scarcity also hardens competitive behavior. I heard stories of ByteDance aggressively sweeping up GPUs, even paying breach penalties to lock compute that had been promised elsewhere. The downstream impact, as described to me, is that companies like the four tigers then end up under severe compute pressure, sometimes even with GPU availability declining.
This is why so many conversations looped back to DeepSeek. People hope DeepSeek’s next-generation model can be more optimized to relieve inference pressure. I also heard a claim that DeepSeek is far ahead on infrastructure and engineering compared to other Chinese labs, and that DeepSeek is very profitable because it provides model architecture and weights early to cloud vendors like Tencent Cloud so they can optimize ahead of time, and Tencent Cloud pays at the 10亿级 (1 billion) level. Even if DeepSeek opensources papers, people told me the inference performance is difficult to match because of many detailed optimizations.
And there were even jokes that said DeepSeek is single-handedly carrying the Chinese AI industry on its back.
The leaderboard economy sits on top of this scarcity in a way that distorts perception. People told me OpenRouter rankings have a lot of “water” 水 - inflated because many apps are free. Zhipu GLM, MiniMax, and Moonshot Kimi on KiloCode were free for the first week, then started charging. Platforms like KiloCode encourage models to compete down to free to climb rankings, but the free usage creates inference pressure. Zhipu started charging because the load became unsustainable.
This is where the subscription thesis again makes more sense. By explicitly using the Spotify model, LLM companies could then build a more durable subscription habit.
Fire Horse: freedom and energy, intensity and chaos
If Part 1 is about why the gap can close, Part 2 is about why closing the gap doesn’t automatically create a business.
China’s software market tends to drive prices toward zero, especially when platform giants can bundle substitutes for free. That dynamic shows up again in AI: free access expands demand, and demand immediately runs into compute scarcity. Labs end up rationing usage, optimizing inference like a power grid, and competing on distribution as much as model quality. In this environment, “selling API” is not a clean growth lever. It’s a supply chain problem.
That’s why the durable layer looks downstream. Wrappers and distributors persist because they do the last mile: concurrency, support, acquisition, and workflow packaging. Lovart is the dealership model. It exists because upstream vendors don’t want to become a nationwide service organization.
The way out is not another leaderboard win. It’s a business model that makes metering invisible. Subscription is the clearest candidate, and “sell outcomes, not tokens” is the direction people keep circling. The winners in China won’t just be the labs that can catch up. They’ll be the ones that can package AI into a service, survive involutionary competition, and turn scarce compute into a durable margin.
And for now, let us all wait for DeepSeek’s next release, and watch how quickly labs translate it into usable products, and watch whether anyone can turn scarce compute into durable subscription revenue rather than an endless free-for-all.
And for Part 3…
P.S. Recently, I met with some coolcats like Breakneck’s Dan Wang when he was visiting HK, Usman, who’s building in biotech/ AI, BOSS’s product manager, who told me about how they were using AI internally, Vincent at the SCMP, who covers China AI like no other, and received a shoutout by Kyle Chan when he went on Sinica. These encounters always make me happy, so please do reach out, get connected, and we can chat about AI, tech, and new media :)




(Manus was Claude-3, not GPT-4o.)
Fascinating. Thanks