ByteDance’s AI Legacy and Strategy: Doubao, Volcano Engine and Beyond

Hybrid model, Video model prowess, Quiet product-led Approach

Jun 24, 2025

Hi all,

This has been a long time in the making. It is part of a series of in-depth examinations of Chinese big tech firms and their AI strategies, beginning with Alibaba, followed by Huawei (parts 1 and 2), and a few shorter pieces on Tencent. Without further ado, our protagonist today is ByteDance.

ByteDance's Global AI Ambition Runs Into Obstacles — The Information

In January, we wrote that Bytedance’s AI application Doubao showed that inference demand could indeed go up 1 billion times. Since the release of DeepSeek’s R1 model, the global spotlight has been on the startup lab, and ByteDance seemingly took a quieter approach compared to its aggressive consumer conversion attempts before this moment. First-time AI users swarmed to Tencent’s WeChat as it integrated DeepSeek and its own Yuanbao LLM into its ubiquitous social media platform. The functional adjacency and existing 1+ billion users helped propel the consumer usage of Tencent’s platform to the lead quickly. But that lead remains in text models, with ByteDance offering video and multimodal AI, it is quickly catching up in consumer usage.

However, ByteDance is playing a long-term game, and its edge stems from a decade of recommendation-system expertise, now channelled into a fast-iterating LLM family (Doubao) and an enterprise cloud layer (Volcano Engine).

ByteDance’s legacy of algorithmic advantage

Before Doubao, ByteDance had already widely adopted AI. But it’s not what you think. ByteDance’s short-video apps TikTok and Douyin, as well as its news aggregator Jinritoutiao, leverage a sophisticated recommendation algorithm. At its core, it utilizes machine learning technology to best personalize the "For You" page (FYP) for each user. What it does is it analyzes user behavior and content characteristics to predict and deliver content likely to be engaging. And this algorithm is the secret sauce to its success, outcompeting other social media platforms like Shorts, Threads, and many other products that tried to emulate that success.

And because of this special edge, the company has reportedly stated that it would rather shut down the app in the U.S. than sell it, even under intense scrutiny and pressure from the U.S. government over the past few years.

What's so special about TikTok's algorithm? | REUTERS

As it has remained a non-publicly listed company (despite some of that not out of choice), it has been able to enjoy the anonymity, or at least been able to avoid the pressure to disclose certain business and financial details of the company. With that comes a perk, where they can build in stealth.

The Evolution of ByteDance’s AI Research

While most of the Chinese research labs started really working on LLMs starting in the late 2010s to the early 2020s, it was said that ByteDance didn’t really prioritize this “business line” until the ChatGPT explosion - and after the CEO Liang Rubo held a big meeting for the company management expressing disappointment in the leaders lack of vision and innovation. With Alibaba releasing its Qwen models on open-source platforms and Tencent integrating DeepSeek into its existing platforms, ByteDance seemed somewhat sluggish externally in this regard. However, they have actually built up a formidable team in the quiet.

Zhang Yiming Architect of TikTok and Bytedance || BusinessStorytime - YouTube — Founder of ByteDance, Zhang Yiming (who mostly resides in Singapore now)

Many say that ByteDance founder Zhang Yiming’s involvement in AI research at ByteDance was similar to the role Sergey Brin played at Google. Despite having kept a low profile since 2021 (the Chinese internet crackdown) and passing over the reins to the current CEO Liang Rubo, Zhang remained active in the actual business decisions, similar to Jack Ma/ Joe Tsai’s roles with Alibaba. Similar to Alibaba, where the top leadership baton was passed to long-term, trusted, and loyalists, Liang is the co-founder and former university roommate of Zhang. The founders, and many would say the real visionaries, are the anchors and leaders steering the internet behemoths amid AI disruptions and new innovation.

Doubao, a Side Project Turned Flag-Bearer

As mentioned, the AI lab was launched well before Doubao, as early as documented in the mid-2010s, and has undergone a few name changes and priority shifts. It started off mostly focused on natural language processing, and LLM research wasn't truly prioritized until ChatGPT’s 2022 moment. Soon following the release of OpenAI’s consumer application, Chinese big tech and the original four tigers one by one released their versions of chatbots.

In 2023, Doubao was launched, and a team focused on large language models quickly rose in importance internally. The LLM and the consumer app are both named Doubao, which means Bean-Paste Bun (a sweet bao). It now has over 60 million monthly active users, though lagging behind DeepSeek, it’s far exceeding most domestic rivals.

According to recent data, Doubao’s paid token traffic rose 22× in twelve months while pricing fell 60%, which obviously is not welcome to all users. The flagship model, Doubao 1.6, was recently released and now supports 256k-token context windows and multimodal reasoning. Its video model Seedance overtook Google’s Veo on the Artificial Analysis leaderboard, which was a bit of a shocker for many.

Talent-first approach

How did Doubao come about? Let’s start looking at Seed. This team was created in early 2023 as an umbrella research group to focus on frontier AI research work, which covers NLP, CV, and Speech Lab, and the team is spread across the headquarters in Beijing, as well as Singapore and Palo Alto. It heavily recruits from PhD programs through what is internally referred to as the “Top Seed” program. Though it mirrors efforts by Alibaba and Tencent, it is less discussed externally, and it is said that there are probably a couple of thousand people part of this program, which is more sizable than many expect.

The strategy behind Seed is to build a strong team that can pivot when needed and is focused on the most frontier model research. The team lead joined Seed after 17 years at Google (although news literally came out today that he potentially has been demoted/ let go for personal reasons - misconduct in the workplace scandal - not confirmed yet). It’s ByteDance’s solution to Alibaba’s Damo Academy and Tencent’s Tencent Research Institute.

Actually, before Seed, it was said that an AI lab was started in the mid-2010s, and I found

Tony Peng

’s article here very helpful in providing more context. As he wrote, by mid-2023, ByteDance released its first LLM called Skylark (which was later rebranded to the same name as its LLM model, Doubao). Still, its GPT-like bot Doubao was received with little interest, only reaching 1.3 million DAUs by November, compared to ChatGPT’s 300 million weekly users already by then.

https://x.com/deedydas/status/1936990542669459552

Doubao: The Flagship AI Product

Doubao was actually first launched as Skylark in 2023, designed for tasks like text generation, image creation, speech synthesis, and reasoning. The Doubao large model has evolved from a single language model into a comprehensive model family system, covering multimodal and multi-scenario applications, and has become one of the most successful cases of large model commercialization in China. The technology enables e-commerce sellers to create studio-quality promos in minutes, significantly reducing both time and economic costs. The technological iterations and market expansion of the Doubao model family fully demonstrate ByteDance's "technology-driven product" development logic and rapid market responsiveness.

In terms of technological evolution, the Doubao large model has undergone several major upgrades.

In May 2024, ByteDance first released the Doubao model family. Subsequently, in third-party FlagEval evaluations, the Doubao-Pro-4k model ranked second in the "objective evaluation" of closed-source large models with a comprehensive score of 75.96, trailing only GPT-4 and becoming the highest-rated domestic large model; it also secured second place in "subjective evaluations."

By August 2024, Doubao had significantly upgraded its visual and speech capabilities, with its text-to-image model gaining a better grasp of "Chinese aesthetics" and its TTS (text-to-speech) achieving precise emotional expression. By 2025, the Doubao model had iterated to version 1.6, featuring enhanced reasoning, multimodal understanding, GUI operation, and front-end programming capabilities.

The technical architecture of the Doubao model family is mainly characterized by three key aspects: 1) Full-modality support, 2) vertical scenario optimization, and so-called 3) Chinese cultural reinforcement.

For full-modality support, Doubao covers not only basic modalities like text, images, and speech but also extends to complex ones such as video and 3D.

For instance, Doubao's video generation model supports multi-shot narrative and multi-task video generation, producing smooth, detail-rich, and cinematic-quality 1080p HD videos.

For vertical scenario optimization, Doubao offers specialized models tailored to different applications, such as the UI-TARS model for graphical interface interaction, vector retrieval models, and role-playing creative models.
In terms of Chinese cultural reinforcement, Doubao's text-to-image model demonstrates precise comprehension of Chinese elements—including figures, objects, dynasties, geography, cuisine, and festivals—enabling high-quality generation of culturally authentic images.

The main publicly available models include:

Doubao 1.6: The flagship next-gen model with comprehensive upgrades, excelling in knowledge, coding, and reasoning.
Doubao 1.5 Lite: A lightweight language model matching or surpassing GPT-4 Omni and Claude 3.5 Haiku in benchmarks like MMLU_pro (comprehensive), BBH (reasoning), MATH, and GPQA (professional knowledge).
Doubao VideoGen (Seedance 1.0): A multi-shot, multi-task video generation model.
Doubao Vision: A multimodal model with advanced visual understanding and reasoning.
Doubao Real-Time TTS: Generates ultra-natural, high-fidelity, personalized speech by predicting text emotion and tone.
Doubao Voice Clone: Achieves 1:1 voice replication in 5 seconds, supporting cross-language migration.
Doubao Text-to-Image/Image-to-Image: Specializes in Chinese cultural elements, offering 50+ style transformations and creative extensions like outpainting/repainting.
Share

Zooming in on the three models that were recently released in June 2025:

(1) Doubao-seed-1.6 is an All-in-One model that supports 256k contexts in deep thinking and multi-modal capabilities. Developers can use automatic thinking mode selection and achieve token savings.

(2) Doubao-seed-1.6-thinking supports 256K long contexts, deep thinking in coding, mathematics, and logic.

(3) Doubao-seed-1.6-flash supports deep thinking, multi-modal, 256k contexts with low latency time. Its visual capabilities are comparable to global peers in the market.

Volcano Engine: The Enterprise Platform

Volcano Engine is ByteDance's cloud computing and AI service platform, launched in 2021 to provide enterprise clients with advanced technologies like recommendation algorithms, data analytics, and artificial intelligence solutions. It focuses on enabling businesses with tools for intelligent applications, visual and data intelligence systems, and multimedia technologies, drawing on ByteDance’s expertise from apps like TikTok and Douyin. Unlike traditional enterprise solutions (e.g., CRM or ERP), Volcano Engine emphasizes AI-driven capabilities, including large language models, cloud infrastructure, and multimodal AI services, to support industries such as e-commerce, automotive, and entertainment.

It serves as the primary platform through which Doubao’s capabilities are commercialized and delivered to external enterprise clients. Doubao models, including versions like Doubao Pro and Doubao Lite, are integrated into Volcano Engine’s ecosystem, enabling businesses to leverage these LLMs for applications such as AI chatbots, content creation, and smart cockpits. It also supports Doubao’s infrastructure, offering model fine-tuning, inference, and evaluation through its Volcano Ark platform, and has initiatives like providing trillions of free tokens for academic research. This synergy allows Doubao’s AI advancements to scale across industries via Volcano Engine’s cloud services.

Commercialization and Market Impact

As we’ve written before, Alibaba has taken on a roadmap more similar to American big tech, offering infrastructure support and AI as SaaS, while Tencent is taking a full-on consumer-facing approach through its existing 1+ billion active DAUs across its applications.

According to the latest QuestMobile, Q1 2025 saw Doubao reach 110M MAUs (a pretty drastic improvement from last year), ranking second among AI apps (after DeepSeek's 190M).

By April 2025, Doubao surpassed DeepSeek to claim second place on Apple's App Store free chart (behind ByteDance's short-drama app Hongguo), reflecting strong user growth momentum, partly due to traffic funneling from ByteDance’s super-apps like Douyin, which integrated Doubao’s AI features and promoted its contact interface in user message lists.

For commercial applications, Doubao has penetrated diverse industries. Data from Volcano Engine shows average enterprise token usage grew 22x since its May 2024 launch, highlighting rapid adoption. Key use cases include:

Social/Entertainment: Role-playing, interactive storytelling, chat assistance.
Education: Personalized tutoring, Q&A.
Customer Service/Sales: AI agents, sales pitch generation.
AI Search: Semantic understanding, precise answers.
Marketing: Ad copywriting, content creation.
Hardware Assistants: Device interaction, voice control.

Anyway, daily usage on Doubao has exploded since May 2024, like I mentioned, even though temporarily it felt a bit irrelevant when DeepSeek was rolled out and Moonshot’s Kimi had its 15 minutes of fame. The platform now serves more than 16 trillion tokens a day—roughly 137 times its year-ago throughput. ByteDance cut prices just as aggressively, dropping input and output rates to RMB 0.8 and RMB 8 per million tokens, respectively. That makes the current 0–32 k-context tier about 63 percent cheaper than both Doubao 1.5 and DeepSeek R1, signalling a deliberate “price down, volume up” land-grab.

The product itself has evolved in tandem with it. Doubao’s new DeepResearch workflow allows the model to think, search, and stitch together answers autonomously, turning tasks that once took analysts days into 5-to-30-minute tasks to complete.

A multimodal backbone expands the addressable market, as e-commerce sellers can now verify merchandise images in real-time, autonomous-driving teams can label sensor data at scale, and security customers gain faster anomaly detection. The front-end has been shown to be equally ambitious as a graphical interface orchestrates agents, web browsers, and off-the-shelf software so smoothly that booking a hotel room becomes a one-step process.

On the video side, Seedance 1.0 Pro raises the bar for generative quality. Its seamless multi-camera narrative, blended actions, and natural camera movement sit at the top of the third-party leaderboard, Artificial Analysis. What people are highlighting is also that the pricing is disruptive. At RMB 0.015 per thousand tokens, a five-second 1080p clip costs roughly ~RMB 3.7. A marketing team with an RMB 10,000 budget can crank out about 2,700 high-definition videos on Pro, or close to 9,700 clips on the lighter Seedance 1.0 Lite. And its no secret that ByteDance is already pitching the model to e-commerce studios, film-and-TV producers, and game developers. (whether that is good for copyright issues is another discussion)

Audio is moving just as quickly. Real-time speech models enable live singing, voice impersonation, and performance effects, while the new podcast engine can generate entire shows from a prompt, a web page, or even a Word document that can be completed with natural back-and-forth dialogue between virtual hosts.

All of this sits on top of a growing toolkit for third-party developers. ByteDance now offers a dozen “AI-native” services that span the build pipeline: an upgraded IDE (TRAE), a model-control panel (MCP), PromptPilot for rapid prototyping, the open-source learning framework veRL, and an internal knowledge-management layer. Big-data specialists utilize multimodal labeling tools and automated data agents; infrastructure teams can leverage AgentKit, TrainingKit, and ServingKit. The company has said that it has built in security that leads to encrypted computing modules and an AI-aware firewall.

The common thread is ByteDance’s conviction that enterprise users will soon rely on agents to complete complex, end-to-end workflows, and that those agents must be cheap, multimodal, and natively integrated with the wider cloud stack to be adopted.

Competitive Landscape and Hybrid Approach

From Tencent’s WeChat to Baidu, from Meituan to Xiaohongshu, major Chinese internet apps have integrated DeepSeek one after another. Yet Doubao remains committed to its self-developed approach. But Doubao’s efforts were not announced until DeepSeek; it was, in fact, the number one used AI app in China, and in the latest estimates, it was said that the company spent over 1 billion USD (80 billion RMB) in marketing spend to convert users.

But after looking at ByteDance’s AI strategy holistically, we can see that it is taking a hybrid approach. Similar to the internet era, there were only 1-2 apps that were widely known by people, and many highly popular applications were never marketed as a ByteDance product, such as CapCut for video editing, Lark, the wanna-be China Slack, or now Coze, the no-code agent builder. It’s all a product-led strategy.

Ultimately, it’s the Doubao large model family that serves as the core product vehicle of ByteDance's AI strategy. Interest and expectations for ByteDance are high. And we can see that, at last week’s product launch, where ByteDance introduced its upgraded and updated Doubao LLM 1.6, it was absolutely packed. According to this WeChat 虎嗅 (Tiger Sniff) account, people lined up well out of the conference halls. Its strength is in its high-quality, engaging products.

To sum it up, you can think about it like this. Seed supplies the science, Doubao supplies the models and user scale, and Volcano Engine supplies the monetization and enterprise credibility. With the new video generation and deep-thinking agents, ByteDance is pushing ahead of its leadership in Chinese AI.

Even looking at Doubao, its success has primarily been rooted in ByteDance’s powerful ecosystem. Don’t forget how Meta and Google leverage their access to walled-garden data. ByteDance has something similar. It’s not just about the quantity of data but more about the quality of data, especially as we start shifting focus to video-generation models. It’s about the labeling of data, of where it was shot, how it was shot, what equipment was used to shoot the video, and, of course, a detailed description of the object being shot. All of these quality data points enhance the data used to train video models. And who has more video data than ByteDance?

AI Proem

Discussion about this post