Discussion about this post

User's avatar
Devansh's avatar

This is really well done. I have one question, though.

We often see Chinese labs introduce interesting architectural innovations — for example, Kimi’s muon clip optimizer or DeepSeek’s hyperconnection approach. In these cases, the improvements don’t primarily come from pre-training scale or post-training techniques. They’re more algorithmic or architectural breakthroughs.

How do you think these kinds of innovations factor into the overall trajectory? Are they central drivers of progress, or more like exceptions to the broader trend?

Denis Kalinin's avatar

Grace, awesome article, as always!

The only question I’m still puzzled about: if Chinese labs just wait for the new Deepseek models to come out, why do GLM and Kimi perform consistently better than Deepseek on benchmarks?

https://artificialanalysis.ai/#intelligence#artificial-analysis-intelligence-index

Is it because post-training gives them additional boost in performance?

1 more comment...

No posts

Ready for more?