China’s Genius Pipeline, Moonshot’s Kimi K2.5, and the Lab Financials Question
walk down memory lane. the talent and the exams. The agents who are now working for us.
** Correction on Yao Class Alum
Hello, a lot to unpack today. We will go through the Genius program, talent schemes, AI talent, who’s part of the Yao alum, Moonshot’s new Kimi K2.5, and conclude with the lab financials through Minimax and Z.ai’s IPOs.
As always, thank you for your support of AI Proem and Differentiated Understanding, the podcast available on Apple, Spotify, YouTube, and, of course, Substack.
A little shameless self-promo here as well, I just joined Kyle Chan’s podcast High Capacity, named after his widely popular and insightful newsletter High Capacity. We talked about the big tech strategy, Manus, agentic AI, and Singapore’s role in all of this. See the link here.
Part I: China’s Genius Program and Who Are the 姚班?
The idea of genius classes is nothing new, and maybe that’s why I didn’t even think it was a story when I first saw the FT headlines. It was just… normalized. Accepted.
However, as I read the article by Ziling Wu, it was the personal touch that drew me in, and the explanation of the program's magnitude that put things into perspective. The feedback from the internet also showed how what was normalized in China was very much not actually known or understood in the West, and once again, a reminder of the need to bridge that gap. [Def worth a read, and provides some context to what I’ll write about]
As I wrote in my piece about the brains behind Manus, Peak Ji, the face of Manus, was a student at High School Affiliated to Peking University. He said he wasn’t a good student. But if he weren’t a genius-level student, he wouldn’t have been given access to computer labs and special passes to skip classes that were simply “not in his interest.”
偏科 (quite literally “lopsided course”) is being lopsided in your talents. At the genius level, it’s celebrated and encouraged. At the average level, it’s something tiger moms force you to fix so you can achieve “good grades” across the board. This was widely accepted by society, especially if you leaned towards STEM. Those in 实验班 (the experimental classes) were allowed to be who they were. Because in the academic ranking system, they’d earned it.
But these genius classes weren’t just quick boot camps for Gaokao. They started as early as junior high.
I remember going to junior high in Beijing at 人大附中 High School Affiliated to Renmin University and while everyone was told you weren’t supposed to date at that age, many just did it secretly while holding hands on walks in the school fields or on the bus ride home. But the two quite literally smartest kids? They would make out outside the teacher’s office. Well, one was already offered a full ride to an Ivy League by grade 10. The other was something similar, which I can’t recall exactly.
The joke was: even if you scored last in one of those 实验班 Experimental Classes, you’d still go to Renmin University 人民大学. And ugh, that’s only a top-ten university in China… just not the top two (Tsinghua and Peking). But you get the point.
The FT’s recent China Genius Program article, written by Ziling Wu, has shed light on these programs (that are honestly not a secret in China at all) and provided historical context on how they came about. And as she pointed out — the obsession with Math Olympiads — I do vaguely remember portraits of winners who graduated from RDFZ were hung in the high school building’s main lobby.
But here’s the thing: in my high school in Canada, there was also what was called a “mini school” program. About 15-20 students out of 200 attended core classes separately from the rest of us because they scored the highest and applied to the program, which required intensive examinations in everything from English to PE to sciences. So, as much as this “genius program” is taking headlines, I don’t think the concept itself is unique to China. It’s the scale that’s impactful.
And it’s not just that China has these programs; another country that has optimized talent schemes to the max is Singapore. Which I wrote about here, where Kishore Mahbubani had benefited from one such, from poverty to one of the most influential diplomats.
And in the background, to boost talent, especially of Chinese ethnicity, the Singapore government has gone to each Chinese province and city and asked: who is the highest-ranked student, literally since the 1980s-90s?
They’d offer them undeniable deals to leave their hometowns and families to study at prestigious institutions like The Raffles Institute. Tuition would be paid, and stipends would be offered. In an era with no WeChat and not even affordable long-distance calling, it meant kids were away for at least a year at a time, until school holidays. And if you wanted to stay at Singaporean universities for post-secondary education and your academic achievements were good enough, you’d get a fast track to permanent residence.
Similarly, the top high schools in Beijing and Shanghai would scour the earth for the brightest students and offer them spots at prestigious schools, even when spots that even high officials’ kids or famous businessmen’s kids couldn’t get were available automatically through the notorious Guanxi (relationships). There was always an openness to finding, spotting, and grooming the best of the best.
The idea of 伯乐 and 千里马 stems from over 2,500 years ago during the Chunqiu 春秋时期 era. It doesn’t matter how obscure your talent is; you just need the right person to spot it. This comes from an old Chinese story: Bole, an expert judge of horses, once saw a thin, tired horse struggling to pull a heavy cart. While others overlooked it, he recognized it was actually a horse capable of running a thousand miles a day, a true rare treasure. The horse neighed as if to thank him for understanding its true worth.
The moral: even extraordinary talent needs someone who can recognize and nurture it. These schools and institutions acted as Boles. The horses were the students who ached to be found. And because of the universal test system, those academically gifted were found, especially those who really excelled in STEM.
It’s with that kind of thinking that academic foundation is core to training how to think and how to build discipline, that my parents moved my brother and me to Beijing after 10 years of Canadian education. And to my shock, while I was still learning basic division and multiplication in grade 6 in Canada, I was thrust into Chinese public education, already studying variables.
As I stayed in Beijing until grade 10, after completing the 中考 (junior high-to high school exam) where you’re ranked by your grades across the whole city, it really didn’t feel like it was torture. In fact, I’m grateful I did it, it showed me discipline, hardwork could drive results. As someone who could barely even write complete sentences in Chinese when I first got to Beijing, I was able to rank somewhere like top ~10% (I might have remembered this number incorrectly, but I was proud) in the city. And because of those two years of hardwork, where friends also encouraged each other and cheered each other on and tutored each other based on strengths, my honestly pretty average scores in math and calculus in China, were sufficient for me to cruise through Canadian high school and university calculus.
As a very non-quant person, that was enough foundational education, that was a success, my parents thought, I guess.
The Yao Class
The FT beat me to this, but a few weeks ago, I was thinking of writing about the “Yao Class.” While I was at Tsinghua, the Yao-class was this revered group that was low-key like if you now you know crowd.
As the FT put it, one of the most prominent college-level genius programs in China is Tsinghua University’s Special Pilot Class for Computer Science. It’s better known as “Yao Class” (姚班), named after famed Chinese computer scientist Andrew Yao. Yao, who trained at Harvard and taught at Princeton, is famous for pioneering work in quantum computing and cryptography. He’s the sole Chinese winner of the Turing Award.
“I need to go back now, permanently.”
This was a decision he made at 57 years old. A line from his farewell speech to his students at Princeton.
In a Chinese language profile piece, the media wrote: 没有思虑太久,姚期智便坚定地表示,能为祖国培养人才,对于他来说非常有意义。He didn’t consider it for long, Yao was determined, to nurture the next generation of talent in his motherland. This meant a significant deal to him.
He pushed them academically but also gave them freedom to explore. He believed, to teach geniuses, you need to use the genius way. And as they matured into their own, there was an understanding amongst them all, “要么上货架,要么上书架” get your resarch to market or to the shelves.
Fundamentally, it meant your work is not meant to be kept a secret. And that deeper down philosophical teaching shows through their embrace of open source.
And being part of the Yao class is a proper signal within the Chinese AI community. I remember when someone at Tencent first told me about Yao Shunyu joining them, the first thing they said wasn’t “he’s from OpenAI.” It was: “姚班的” — as in, one of the Yao alums.
So, who are the notable Yao Class alums?
Some are stargazers. Some are extreme pragmatists. Some have no intention of following the rules set by the previous generation of giants. Yet they’ve all converged at the gates of AGI at this time, it seems. They have the grit to go head-to-head with overseas AI giants — and they’ve also known the embarrassment of being strapped for compute. The waves are huge, the helmsmen are young. But that contrast is precisely the truest cross-section of China AI in 2026.
Yao Shunyu, Chief AI Scientist at Tencent, former Researcher at OpenAI
Yang Zhilin, Founder of Moonshot
Ma Tengyu, CS Professor at Stanford
Lou Tiancheng, Founder of Pony.ai
Qi Yin, Wenbin Tang, Mu Yang, Co-founders of Megvii
and, of course, many more who have become leaders and shapers in the industry…
And the comments below a WeChat article about the Yao Class capture how ordinary people see it: “都是天选之子,普通人进不了的赛道” — “They’re all chosen ones. Ordinary people can’t even enter this track.”
And just like that, let’s zoom in to one of those 天选之子 — Yang Zhilin of Moonshot — who just dropped K2.5.
Part II: Scaling Agentic AI — Moonshot’s Kimi K2.5 and the Agent Wars
If you’ve been anywhere near AI Twitter the past few weeks, you’ve seen the “clawdbot” frenzy, lobsters are everywhere. What started as a solo-built agent spawned an entire ecosystem overnight: agent social networks, infrastructure plays, and suddenly everyone was talking about “agents that just work.”
There’s real money behind this shift. Anthropic announced in November that Claude Code had reached $1 billion in annualized recurring revenue (ARR). By year’s end, Wired reported it had added another $100 million. Coding tools aren’t just features anymore, they’re becoming revenue drivers.
The timing is instructive. Because while the West was having its clawdbot moment, Moonshot quietly dropped Kimi K2.5 — and the through-line is the same: the shift from chatbots to agents that execute.
For most, Moonshot feels the most secretive compared to its peers. As its peers like MiniMax and Z.ai went public earlier this year on the Hong Kong Stock Exchange, it chose to fundraise privately, again.
On January 21st at Davos, Moonshot president Zhang Yutong announced, “Kimi will release a new model very soon.” Six days later, on January 27th, Moonshot released and open-sourced Kimi K2.5.
This is Kimi’s most capability-dense update yet: vision understanding, code, multimodal input, thinking and non-thinking modes, agent capabilities, and agent swarm functionality — all packaged into a single all-in-one model. Trained on 15 trillion tokens, K2.5 isn’t just catching up — it’s competitive at the frontier. According to TechCrunch, K2.5 beats Gemini 3 Pro on SWE-Bench Verified, outperforms both GPT 5.2 and Gemini 3 Pro on SWE-Bench Multilingual, and surpasses GPT 5.2 and Claude Opus 4.5 on VideoMMMU.
In the release video, founder Yang Zhilin personally introduced K2.5.
What’s K2.5 Actually Like?
From those who’ve tested Kimi K2.5, they’ve said the intuitive feeling is that it’s entirely focused on “productivity”: the core is pivoting to coding, office work, and complex task collaboration — not scattering across features.
It’s using specialized agents, many, many of them, that are experts at specific tasks, rather than one general super “know-everything” agent. This seems to be the trend.
This sentiment is becoming more and more mainstream. Mistral’s CEO, Arthur Mensch, said on The Big Technology Podcast that it’s not possible to have one system to solve problems. And there is, in fact, no such thing as AGI. In real life, enterprises are complicated systems. For a specific task, there will be corresponding models. In this case, for specific tasks, there will be designated agents.
K2.5’s orientation is a traceable strategic focus, marking Moonshot’s position shift in the large model capability spectrum: its technical label has migrated from 2024’s “long context” to 2025’s complex reasoning, thought processes, and agent task coordination.
Based on a WeChat article from 深网, someone close to the company noted that this iteration’s value isn’t in leaderboard rankings — it’s in the engineering orientation. The model is reverse-designed around agent tasks. The core objective is to decompose tasks stably, invoke tools, and maintain consistency in long-chain reasoning.
This is the same bet Anthropic made with Claude Code: not “smartest model” but “most reliable agent infrastructure.” Moonshot’s Kimi Code, launched alongside K2.5, is their direct answer to Claude Code and Google’s Gemini CLI — a coding agent that can autonomously write, test, and iterate on code. Unlike its Western counterparts, Kimi Code is open source and already integrates with VSCode, Cursor, and Zed. The similarities aren’t accidental. Both companies are betting that the next moat isn’t benchmark scores but agent reliability.
This orientation echoes Moonshot’s efficiency-first approach in recent years. The company has stated in multiple contexts that it doesn’t have the capacity to infinitely scale compute, so it emphasizes efficiency improvements at the algorithmic and system levels rather than simply scaling up training.
In K2 series training, Moonshot used an improved Muon optimizer to achieve roughly 2x token efficiency improvement, and enhanced large-scale training stability through mechanisms like QK-Clip. On the inference side, they proposed Kimi Linear — a linear attention mechanism that improves long-context processing speed while maintaining effectiveness.
Zhang Yutong summarized this strategy at Davos: completing K2 and K2 Thinking training using only about 1% of the resources of top US labs. This also means Moonshot’s validation path for model capability is shifting toward engineering-system orientation.
This change is also reflected at the product level. Since May 2025, Kimi has densely launched agent features, including Researcher, PPT, and Kimi Code. In September, they further launched OK Computer, which can invoke tools in a virtual computer to complete development, data analysis, multimodal content generation, or create presentations. Agents have been positioned between model capability and commercialization, gradually becoming a critical middle layer.
Meanwhile, Moonshot has deliberately placed some capability-validation scenarios overseas. Public data indicate that after the K2 series release, the models held a certain share on model invocation platforms such as OpenRouter.
There seems to have been a shift in what constitutes “landing on the moon” now. The market generally believes Moonshot is no longer pursuing “big and comprehensive” showmanship. Instead, they’re betting on a differentiated direction: using engineering capability to solve real problems. Whether this strategy succeeds depends on whether their agent swarm and core functions can withstand large-scale validation in real, complex business scenarios and remain stable and reliable.
At the same time, when facing equally strong competitors like DeepSeek, balancing top-tier performance, usage costs, and commercialization speed will be Moonshot’s core challenge going forward.
The Dual Squeeze: Resource Wars and Shifting Evaluation Standards
In the earlier phase, Moonshot was one of the first Chinese LLM companies to focus on 2C end general assistants.
At its founding, thanks to Yang Zhilin’s academic background, the company received high expectations and was viewed by some as “China’s OpenAI.” Moonshot chose early to use a product form to carry model capability.
In August 2023, during the late stages of training their first model, Moonshot initiated the Kimi AI assistant project. At the time, the company was about 50 people, and Kimi was more like a showcase window for model capability.
After the product launch, Kimi’s monthly active users maintained high growth. Through advertising, they completed cold start and entered the top tier of general conversational products. QuestMobile data shows that by the end of 2024, Kimi MAU exceeded 20 million, second only to Doubao (ByteDance). Many in the industry view that period as Moonshot’s most glorious phase in 2C.
But this growth quickly encountered growing pains. After entering 2025, as ByteDance’s Doubao, Tencent’s Yuanbao, and Alibaba product lines advanced simultaneously, the track evolved into a competition highly dependent on resource investment. For big companies with platform entry points and distribution systems, advertising costs can be internally absorbed. For independent startups, continuous ad spend adds up as an expense difficult to sustain long-term.
But as we all know, in early 2025, the rise of models like DeepSeek reshaped the industry’s selection criteria: almost zero large-scale promotion, purely technology-reputation-driven growth. This phenomenon sent shockwaves through the industry.
Even though we wrote extensively about his quirkiness and obsession with AGI, under the pressure of DeepSeek and the sharing of the limelight, Yang Zhilin is more inclined to view scale as a phased result rather than a priority objective. Before model capability forms a stable gap, prematurely scaling users might actually amplify resource consumption and path-misjudgment risks.
So we saw that Moonshot began noticeably contracting its 2C business. At the product level, they gradually stopped large-scale advertising, contracted entertainment directions, and paused or slowed multiple 2C product lines, including Ohai and Noisee. At the technical level, resources were reconcentrated on foundation model training and reasoning capability. At the market level, focus shifted from domestic user-scale competition to overseas developer ecosystem and professional user scenarios. At the strategic level, the company shifted from closed-source to open-source, placing product and commercialization growth primarily overseas.
Part III: Lab Financials
The strategic contraction and focus has also led Moonshot to exhibit a rhythm on the capital path starkly different from its peers.
At founding, Moonshot demonstrated exceptionally strong fundraising ability. Just three months after establishment, the company completed an angel round exceeding $200M, with a post-money valuation of about $300M.
The Pre-A round closed in July of the same year. 2024 was the fundraising peak: in February they closed an A+ round exceeding $1B at a $2.5B valuation; in August a B round exceeding $300M pushed valuation to $3.3B.
By end of 2025, Moonshot completed a $500M C round with oversubscription, at a post-money valuation of about $4.3B. And according to Bloomberg, Moonshot is now seeking a $5 billion valuation for its next round — a bet that staying private longer can command an even higher premium.
Yet forming sharp contrast with this strong fundraising performance is Moonshot’s “standing pat” on IPO progression. As multiple peer companies have successively initiated IPO processes, Moonshot has not advanced in parallel. For a company that already has the conditions for listing and is in a high-attention track, this choice is unusual.
What the MiniMax and Zhipu IPOs Reveal
According to Bernstein’s January 28, 2026 research note, the MiniMax and Z.ai IPOs have given public investors the first detailed glimpse globally of AI model lab economics. Since listing on January 8 and 9, both stocks have surged: Z.ai up 101%, MiniMax up 196%. Gross margins on key segments sit in the 60-70% range, though profitability remains distant with R&D expenses still multiples of revenue.
Moonshot’s Different Path: Staying “Pre-Rubric”
Entering late 2025, the industry landscape further diverged: in mid-December, Zhipu and MiniMax both passed the Hong Kong Stock Exchange hearing and began share offerings. Just days later, on December 31st, Yang Zhilin’s long-unseen internal letter surfaced, disclosing the company still holds about 10 billion RMB in cash and stating plainly they’re “not in a rush to go public.”
Beyond the shared anxiety about external technology competition, Moonshot also faces pressure from historical “old accounts.” Some observers believe that equity and arbitration issues related to Moonshot’s early spin-off haven’t fully settled. Rushing to initiate an IPO in this state carries inherent complexity in compliance and information disclosure. By comparison, staying in the primary market to continue technology and product evolution has lower operational costs.
The company can continuously release models and continuously complete large fundraising rounds, indicating strong persuasiveness in technical capability and capital markets. But Moonshot still hasn’t clearly answered more fundamental questions for the outside world: how will product form be established, how will commercialization unfold, where will stable users come from? (It seems like it’s pivoting to outside China as revenue from outside China has now surfaced the Chinese market)
Moonshot is more like a company whose technical capability has been validated, but whose corporate form is still taking shape. This state of technology-first, with commercial contours not yet fully emerged, also reflects to some degree the overall stage of China’s domestic LLM industry.
The Yao Class Advantage in the Long Game
There’s a through-line here that connects the Yao Class alumni - 姚班, Moonshot’s strategic patience, and the IPO-versus-fundraise divergence.
The Yao Class graduates, as some commentators have noted, aren’t where they are because they made the right choice every time. They’re there because they never left the table during long stretches of uncertainty and have the resource that others don’t, the vouched and exclusivity of their network.
The clawdbot moment crystallized something: we’re entering an era where “agents that do things” matters more than “chatbots that answer questions.” Moonshot, with Kimi Code and its agent swarm architecture, is positioning for that world. Whether they’re early or right remains to be seen. Others will continue to innovate their own ways, leaning into their ecosystems, as we’ve written extensively on AI Proem.
The 天选之子 ‘chosen ones’ are at the table and it’s their world we’re all in now.
Key sources:
FT Magazine: China’s genius plan to win the AI race is already paying off (Ziling Wu)
Bernstein Research “China Internet: One small step for two AI labs… thoughts on strategy, competition, and the path to profits” (January 28, 2026)
TechCrunch: China’s Moonshot releases a new open source model Kimi K2.5 and a coding agent
Bloomberg: China’s Moonshot Unveils AI Model Ahead of DeepSeek Release
Alex Kantrowitz’s Big Technology Podcast





Interesting, the Yao Class is such a good idea. There was a dystopian novel by John Hersey my mother made me read, The Child Buyer. Summary: An imaginary, utterly absorbing record of the investigations of the Committee on Education, Welfare, and Public Morality of an unnamed state senate into the activities of Mr. Wissey Jones, who has come to the town of Pequot on what he says is urgent defense business.
It's a question for any country or community, how do you find the best and brightest and direct them to where they have the most leverage.
Thanks for a thought provoking post, I appreciate it.
The talent pipeline is real. Kimi K2.5's benchmarks speak for themselves. What concerns me is the consumer side though. Moonshot's privacy policy for kimi.com explicitly says they use your prompts to train models, and the Singapore-Beijing jurisdictional setup adds another layer of uncertainty. Dug into the actual policy language here: https://generativeai.pub/kimi-k2-5-is-brilliant-but-think-twice-about-using-kimi-com-157cbb26f9a3