Token-minimizing? Notes from a whirlwind week at SuperAI in Singapore
enterprises are rationing tokens, rushing to China’s open-weight models.
Hi all!
What a whirlwind of a week. I recorded an episode with Bloomberg Odd Lots on China AI on Monday, flew to Singapore on Tuesday for three days of back-to-back moderating and meetings across the ecosystem during Singapore AI Week, then came back to Hong Kong on Friday to be with bean 1 and bean 2. I’m only now recovering enough to think it all through.
Before anything else, I want to thank the SuperAI team, Lightspeed, Liminal, Vertex, HubSpot, and the various local friends who extended invitations and hospitality. As usual, I’ve tried to comb through everything rattling around my head all at once in one long-form piece.
Token-Minimizing: Notes From a Whirlwind Week at SuperAI
So, you know what’s actually happening?
People are reversing out of token-maxxing and into token-minimizing — or at least rationing. Qianer at The Information made the point well in her newsletter: “While Silicon Valley is moving away from tokenmaxxing—the deliberate act of consuming tokens to demonstrate AI nativeness—in Southeast Asia, home to nearly a dozen developing countries, developers and users never went through that stage and have always been cost-conscious.”
Her two takeaways from SuperAI rhymed with mine: one, it’s getting harder for startups to wow users because AI has already shown it can do so much; and two, even with token costs falling, it’s still not cheap enough to scale.

Let’s sit with that for a second. From Stripe’s customers to the largest enterprises — think Uber, Microsoft, Tencent, and ByteDance (all have publicly declared some reprioritizing token spend) — companies are, one by one, starting to ration the tokens they burn. Token-minimizing is really just this: once tokens are a managed cost rather than a free good, traffic flows to whatever model is cheapest for the job. For some that means a hybrid setup, but for another group, a growing number, it means routing to China’s efficient open-weight models.
So then you ask: 1) where does value accrue once the models themselves are commoditizing? Benedict Evans pressed exactly this from the main stage, and nearly every panel circled back to it. 2) the one MIT’s Max Tegmark reminded us all of: who are we even building AI for? How do we build it safely, and what does “pro-human AI” actually mean?
Let me unpack this step by step, starting with the SuperAI stage.
The visual AI stack
I moderated the visual AI stack panel, which was a treat because it spanned the whole stack. Alibaba Cloud held down the infrastructure layer, leaning into its sprawling ecosystem and courting developers with open-source models and tools — the Qwen models, the Wan video model, and its video-generation tool Happy Horse. Meitu sat much further up, in applications; it runs a proprietary R&D team focused on facial recognition, but it largely builds on top of other people’s models.

What we kept coming back to is that the ecosystem is vibrant, but the real use case for visual models still hasn’t scaled. Today it clusters in gaming, marketing, and prosumer work and e-commerce vendors making product ads, indie filmmakers, that crowd. Which creates a funny tension: the market feels small even though it’s big, if that makes sense. And for startups like Video Rebirth, their vision is not just to build frontier video models but push that technical capability into world models, leveraging their know-how in 3D and spatial intelligence.
And the main bottleneck faced by everyone?
It’s the thing nobody fully controls: data. Data is both the edge and the ceiling here. You also can’t think about training image and visual models the way you’d think about training an LLM; a visual is so subjective, so hard to pin down and keep consistent, that the whole exercise is different.
On the business model, my instinct has always been: why not build for enterprise? That’s where the money is, no?
But raw APIs carry almost no margin. Wrap that same capability into a consumer product, and margins can reach ~73%. Usage is lower, and it’s harder to scale, sure, but the value per generation — per token — is far higher, as one investor put it to me.
And for visuals specifically, so much of the moat is taste. How you tune color and effect isn’t a black-and-white question of fact; it’s a matter of taste. For companies like Meitu, though niche, they are made for the aesthetics; that is where they lead.
That’s also why I thought maybe they should build a Stripe for enterprise visual tools, like a clean plug-in layer. But when I sat down with Meitu’s IR, they pushed back: it doesn’t work like that, because enterprises want their visual tools deeply personalized. It’s nothing like the streamlined simplicity of payments.
Which is the perfect segue to my next thought…
Stripe: when the cost of building collapses
Abhi Tiwari, Stripe’s head of global products, made the week’s core theme concrete: the cost to build and the cost to go global are both falling away. His headline number — a $200/month coding subscription can now replace $2 million of up-front spend. And Stripe sees it in its own data: agents are increasingly the ones reading its docs, and by the end of this year more agents than humans will be doing so.
The result is a flood of builders. iOS app releases went from declining month over month to up roughly 24% since agentic coding showed up, and more businesses are adopting it, too. They’re also monetizing faster since Stripe baked payments into tools like Lovable, Replit, and Bolt; recent Stripe Atlas cohorts have gone from incorporation to first payment in about six weeks.
His point is that the cost of intelligence is no longer the bottleneck. What separates winners now is knowing what to build and when, managing tech debt, and taste plus distribution and reach.
But here’s the tension Stripe sets up. If intelligence is no longer the bottleneck, then taste, timing, and distribution become the game — but so does the bill. Cheap per token still adds up fast at scale. And that’s where the heart of my week really begins.
Token-minimizing: rationing the thing that used to be scarce
So let’s actually sit with token-minimizing for a beat.
If intelligence is getting cheap, the flip side is that companies get deliberate about how much they consume. There’s even a tidy Chinese corporate phrase for it: 降本增效, “cut costs, raise efficiency,” which is essentially what every company wants to do to maximize profit… and right now it’s being aimed squarely at the AI bill.
And it’s not just the American names. I’m told Tencent has been nudging employees toward something like RMB 1,000 of token spend a month (a significant drop from like ~5000 USD last month), and ByteDance is finding its own ways to cap internal use. (I got into how the BAT are reorganizing and spending around tokens in my 2025 wrap.) It cuts both ways — some firms cap usage to control cost, others push staff to burn more tokens to prove they’re being productive, but what is clear is that token spend is now a managed line item, and no longer an afterthought or a free good.
Which brings up something I’ve written about before: cheaper tokens don’t shrink demand; they multiply use cases, the Jevons paradox, alive and well in AI infrastructure. So rationing and an explosion of usage are happening at the same time. That sharpens the real question: which model, or which layer, captures all that new demand? Although, honestly, I’m starting to ask a harder one: does demand actually multiply if no one’s proven the ROI, while the costs just keep climbing?
For now, the answer is that traffic flows to whatever’s most efficient per token, and guess what? That is increasingly looking like Chinese open-source models, a posture I’ve argued is an ecosystem strategy, not just a release choice. The open-source strategy was always a business decision first, a philosophical choice second. To bring people into its ecosystem, building trust was the first step; monetization can always happen, maybe in less per sale, but that doesn’t stop a sustainable business.
So anyway, this is a good moment to talk about the labs themselves.
Where China’s models sit now
First, the fun part: it was awesome to catch up with so many of these labs offline, and a real pleasure to introduce a bunch of them around SuperAI.
So where do China’s model labs actually sit today? I had a decent vantage point this week, given that I was able to catch up with reps from StepFun, Tencent, Alibaba, Z.ai, and many more on the panels and on the sidelines.
Ultimately, it isn’t about “winning a race.” It’s that, for once, the structure of the moment runs in their favor. Token-minimizing is a tailwind that plays straight to their strength — efficiency per token — and the models keep leaping. GLM-5.2 is genuinely good (Bernstein puts it close to Opus 4.8), and Kimi’s K2.7 Code is pushing hard on reasoning efficiency. For the first time, you could argue Chinese open-weights models hold the rhetorical high ground, too.
It’s not a clean run, though. China AI still faces two hard constraints: a compute shortage for inference, and an API price war that grinds margins down even as hardware costs — memory especially — keep climbing. New anti-distillation efforts and export controls pile on more catch-up hurdles, and no one’s hiding that. The timing is delicate, too: this is the same moment Anthropic is working through its Fable and Mythos debacle and has doubled its API pricing on Fable 5 (vs. Opus 4.8) — which only widens the gap the Chinese labs are there to exploit.
One tell from the sidelines: the labs wouldn’t talk business with me at all. StepFun and Z.ai are both midway through major capital-markets activity (StepFun filed for an IPO on the HKEX, Z.ai for a dual listing on the Shanghai STAR board), so they kept the conversation focused on tech. It’s now or never, with everyone rushing to gold, and my take is that the HKEX helped set a fair valuation, and that turning back to A-shares helps bring in more liquidity and cash riding that AI fever.
So the picture is two-sided. On the one hand, token-minimizing keeps pushing traffic toward the most efficient models, including Chinese open-source models. On the other, those models keep leaping — GLM-5.2, Kimi’s K2.7 — and narrowing the gap with the frontier.
Which raises its own questions: will it continue? And when is a model’s capability simply “enough”?
All of which makes model routing a much bigger conversation. Can a cheaper Chinese model be the default, with a hybrid setup that escalates to a frontier model only when the task genuinely requires it? More and more, that’s the architecture people are reaching for.
Bernstein put the momentum well in a recent note (Global Internet: Never interrupt your opponent while he is making a mistake):
“AI never sleeps. It probably wasn’t an accident that Kimi and Z.ai, two AI labs we consider frontier in China, announced new models over the weekend. K2.7 Code promised better reasoning efficiency and reasoning capability upgrades. Z.ai’s GLM-5.2 meanwhile positions itself against Claude Opus 4.8; both online developer feedback and our own locally-hosted benchmark tests returned encouraging initial results. Real counterarguments remain, but on the margin these releases support the idea that China’s leading labs can continue to keep pace with global peers. The idea that Chinese open-weights models might now occupy the rhetorical high ground too is a novel development.”
Where does the value go?
That keep-pace question is really a value question, and in some ways it is the one Benedict Evans built his keynote around. I got to meet him briefly, tell him about AI Proem (fan-girl), and his talk is the one I keep chewing on (here’s the full keynote; it expands on his 70-page deck, AI Eats the World, and he goes deeper in this a16z conversation).
He won’t dismiss AI as a fad, but he won’t buy the story of an immediate, sweeping economic transformation either. He opened with: “AI is whatever machines can’t do yet — the moment they can do it well, we stop calling it AI and start calling it software.”
Databases once had superhuman memory; image recognition once felt like magic. Each was “AI” right up until it dissolved into being just software. LLMs may well go the same way. So the real debate isn’t whether the technology is real, but where value gets captured as the models commoditize. Is generative AI the next platform shift after the PC, the web and the smartphone, or does it become another layer folded into everything until it’s invisible?
He even pushed back on the Sam Altman “AI as a metered utility, like electricity” framing, pointing to telecom: mobile data volumes exploded for a decade, and the value never followed. Volume and valuation don’t travel together in a commodity. Evans left the ending genuinely open, which felt more honest than most takes I heard all week.
So where is AI actually diffusing? It’s the thing we’ve been writing about for a while: didn’t internet innovation take decades to play out? When Cisco was founded, did anyone imagine that Uber would become an internet company? Will the models capture all the value this time — or does it pool somewhere else entirely?
Think about it,
Businesses without network effects may lack a scalable model.
How do commoditizing LLMs become sustainable businesses — or do they stay very expensive infrastructure?
A capacity gap and a usage gap are not the same thing.
New workflows take longer to build than anyone expects; for now we’re mostly using new tools for old tasks, not inventing new ways of working.
The frontier models panel turned all of this into theater, with Minimax and Z.ai taking turns telling each other what huge fans they are. Cute. So much for the involution, the cutthroat one-upmanship you’d expect.
Underneath the niceties, though, there was real substance. Mistral pitched itself as the neutral third option and drew the panel’s sharpest line, a potentially specialized tool versus the Swiss Army knife and Minimax with its M3 model pushing toward a one-million-token context window. It argues that distribution, not raw capability, determines how AI evolves, and without GPU distribution, that capacity remains bottlenecked.
Z.ai went back to their favorite car analogies; they’re selling a Mercedes, and Opus is the Rolls-Royce. How many people will need Rolls-Royce or even be able to afford one?
Max Tegmark: the pro-human path
Which is exactly when Max Tegmark widened the lens — from “who captures the value” to “who is any of this even for.” MIT’s Tegmark gave what I thought was the most balanced keynote of the event: neither anti-AI nor accelerationist-at-all-costs, but “pro-human AI,” as he put it. His frame is a fork in the road between a “race to replace” path and a pro-human one. We’ve been on the replacement path a while, he argued, and he even quoted Elon Musk’s own line that, in a likely scenario, none of us will have a job, and AI ends up in charge. But the tide is turning, and he pointed to a recent bipartisan set of 33 principles for pro-human AI built around keeping humans in charge and preserving agency.
We didn’t stop developing fire, he said; we just stayed mindful that it can grill a great barbecue or burn the house down. The point of pro-human AI isn’t to slow the fire. It’s to decide what we cook with it and not burn people with it…
A small proem to what’s next
And yet the most telling signal of my whole week didn’t come from any stage. It came from my WeChat moments, where someone posted that traffic in Yuhang, a quaint suburb of Hangzhou, was completely gridlocked.
Trivial on its face, but maybe the truest tell that China AI is genuinely on. Yuhang, specifically the Hangzhou Future Sci-Tech City and the China (Hangzhou) Artificial Intelligence Town, is the epicenter of the country’s AI buildout and home to global giants and open-source leaders, from Alibaba and Ant to the “Six Little Dragons.” Think Unitree, BrainCo, ManyCore and more. And guess what: I’m headed there in two weeks to visit a few of them.
True to this newsletter’s name — proem means preface — the whole week felt like a preface to the next chapter rather than any conclusion.
Three things I’ll be watching: whether token-rationing hardens into a real procurement discipline (a budget line with an owner, not a vibe); whether model routing will shift to cheaper Chinese options; whether frontier-on-escalation becomes the standard enterprise architecture.
Some shameless self-promo and announcements
It was a pleasure to be back on BBC’s Artificial Human podcast to talk about robotics and China’s positioning in it all. The last time we spoke was a year ago, about DeepSeek and the rise of Chinese open-source AI. And of course, please check out my Odd Lots interview, which should be out in the next week.
In the meantime, summer is kicking off, and the kids are out of school soon. A reminder that we get something like 75% of our time with our babies before age 10, so do slow down, make some lemonade, sit in the sun, kick a ball, and read a book with them. :)
I, for one, will be taking the next two weeks slower. Thus, apologies, I’m posting one more podcast ep this week and then taking a breather for a bit. I appreciate your understanding!
Btw, a few quick news updates
A few things broke while I was sitting on this piece — worth keeping on your radar:
DeepSeek’s fundraise closed, and honestly the structure is the interesting part. Read Jing Yang of The Information on how it’s actually put together.
Alibaba shipped a world model: the Qwen-Robot series. Which loops right back to the visual-stack conversation up top: everyone’s racing from flat generation toward world models.
Lin Junyang, the former head of Qwen, closed the raise for his new AI lab — a world-models shop, reportedly at a ~$2 billion valuation.



Whether token-rationing hardens into a real procurement discipline" — that's the question we kept coming back to too. The wrinkle in Asia is that the policy layer may answer it differently than the market does. In China, tokens are already being framed as an economic signal, not just a cost line to manage. Which makes the procurement discipline question harder, not easier. We looked at what that means for operators.
Never believed ‘token-maxing’ was actually a thing. Rather a phantom dreamed up by bored commentators.