[SF Part II] AI Value Captured by the Token Refineries
As intelligence gets cheaper, value moves downstream to refinement.
This piece builds on my recent SF trip, during which I had roughly 23 meetings with VCs, big tech strategy teams, AI founders, private equity investors, scholars covering China’s digital economy, and hedge fund investors. It synthesizes frameworks from Ben Thompson (Stratechery) and Benedict Evans alongside my ongoing coverage of the China AI ecosystem. One great analogy that really stuck with me was from a VC: the token refinery framework, which I’ll borrow here. Let’s dive in.
The Token Refinery Value Capture
I was sitting outside La Boulangerie in Hayes Valley two Fridays ago, by the park benches, after a long week of back-to-back meetings, demos, and chats about AI-everything, and the semi-stressful life of walking around downtown and looking over my shoulder before sunrise.
Six days of intense AI-ing and no care for any jet lag, I was finally relaxed, looking around. Everyone was sunbathing and seemingly enjoying one of those perfect California afternoons — clear sky, warm sun, the kind of day that finally makes you understand why people still put up with the rent and chaos.
Just before that, I was having coffee with a VC who has been investing in ‘national defense’ tech, a hot sector that seemingly popped up in recent years amid the back-and-forth of the trade war and the rise of China’s soft influence globally. The conversation kept circling back to one word: winning. “So who’s going to win?” “Do you think they’ll win?” “We must win.”
Who’s winning the model race? Whether Anthropic is winning against OpenAI. Whether open-source is winning. Who’s winning at inference cost? Is China winning? Who’s winning in China? Will China win? In what way and why will they win (or not)?
The word kept coming up, and it threw me off a little. Not because it was entirely wrong, but because it felt a bit narrow. On a business level, the AI market has moved past the point where “who has the smartest model” is the most useful question. The more interesting question now is where value is actually being captured, and where it is already being competed away. That is what brought me back to an analogy I heard earlier that week from another VC investor: AI is starting to look a lot like the oil business. And the ‘four-year Albertan’ in me could not resist that one.
Tokens are crude oil
Models produce raw intelligence. They generate tokens. But tokens by themselves are not the end product that any (most) customer actually wants. What customers pay for is legal work completed, code shipped, claims processed, research synthesized, and decisions supported. They pay for refined output.
A lot of investor attention has naturally gone to the infrastructure layer. But once the model is understood as an intermediate good rather than the final product, the analytical center of gravity shifts. The question is no longer just who can produce intelligence, but who can turn it into something usable, trusted, repeatable, and economically legible. In other words, who can refine it?
At the bottom of the chain are the token producers: the frontier labs and open-weight model builders – think OpenAI, Anthropic, Google DeepMind, Meta, DeepSeek, Qwen. They produce the raw capability. This layer is expensive to build, technically impressive, and still moving fast. But the model is not the final product, any more than crude oil is gasoline. And what most enterprises/consumers are paying for is gasoline.
Of course, the first form of refinement occurs within the labs themselves, where raw model capabilities are turned into products like ChatGPT, Claude, and Gemini. This is already a real business. It may not yet fully fund the drilling, but it is clearly one viable business model. The labs are packaging intelligence into interfaces, workflows, and, increasingly, work products that users consume directly. This is why the “labs will be commoditized immediately” argument has always felt too simplistic. The best labs are not only producing crude; they are trying to capture the downstream markup themselves.
External third party refinement occurs when AI is embedded in specific workflows, industries, and customer relationships. Harvey is doing that in legal. We see AI inside defense software. AI inside clinical workflows, compliance, finance back offices, customer support, and vertical SaaS. This layer takes cheap model output and turns it into something more valuable by leveraging better context, process, service, trust, industry know-how, and distribution. That’s why software cannot be just all “vibe coded”.
This model also most closely resembles where software value was historically captured. AWS built many important first-party products, but the cloud era’s great value creators were often third-party companies, such as Datadog, ServiceNow, and Workday, that sat atop shared infrastructure and added enough domain value to justify an independent, premium-priced existence. That pattern now looks increasingly relevant to AI as well.
A model can answer a question. A product can route the answer, compare it with prior cases, attach it to a record, push it to the right person, maintain an audit trail, and make it fit into an institution’s existing way of working. In the coming year, the focus is shifting away from AGI, at least for businesses; it is the conversion of raw intelligence into finished economic output where much of the real business value lies.
The spread is the business
The easiest way to understand any AI company is to ask one question: how wide is the spread between its token input cost and the value of the output it delivers?
In much of today's knowledge work, service professionals still charge based on labor scarcity, credentialing, and process complexity. AI changes the cost structure underneath that. If a model can perform meaningful parts of that work at very low marginal cost, the battle shifts to who captures the markup between cheap intelligence and high-value delivered outcome.
But what protects the spread from collapsing? So I think there are a few factors to examine that can help separate genuinely durable businesses from products that merely look exciting in the current moment.
The first is workflow ownership. If the AI product is deeply embedded in a complex, vertical specific, high-consequence workflow, it becomes much harder to replace with a generic model plus a thin wrapper. Customers don't just buy the output; they buy the reliability that it appears in the right place, in the right format, connected to the right systems, with the right approvals and audit trails. So products have to have that extra moat. Once a product becomes part of the enterprise's operating system or process, then the competitive question changes. It is no longer simply whether another model is cheaper or smarter. It is whether the entire surrounding workflow can be rewired without cost, risk, or disruption, which usually it cannot.
The second is accumulated context. Clever prompting at this point gets absorbed into the base model's capabilities surprisingly quick (they suck that data in like Kirby). Real accumulated context: customer records, prior work product, compliance history, integration into adjacent systems, operational memory. The more useful the product becomes because it sits inside a growing body of context, the less interchangeable it becomes. From closely following the Chinese AI market, one pattern is clear: anything engineering-related — tooling, prompts, harnesses — is usually not sustainable for very long. Non-engineering advantages like domain expertise, user data, distribution, and user habits are more durable. The things people can demo most easily are often the things that travel fastest across the market. The things that stay sticky are usually quieter. They sit in behavior, habit, data exhaust, and institutional memory.
The third is captive customers and limited competition. So this is why the viral X post is relevant. If 50 startups are all refining tokens into generic customer support chatbots, the spread collapses quickly. If you are one of a small number of companies trusted to operate in medicine, national defense, or other regulated environments, the spread can hold for longer because access itself is scarce. The opportunity to arbitrage the gap between low token input cost and high output value is simply too attractive — which is why so many refineries are gravitating toward the high-value sectors first. Over time, competition narrows the gap, but if there is structural scarcity of competitors — regulatory barriers, security clearances, deep institutional trust — the spread persists. In those cases, what is scarce is not intelligence itself, but goes back to permission, trust, and workflow access.
This is why the word “wrapper” has become too dismissive to be analytically useful. There is a massive difference between a generic chatbot product and a company embedded in a regulated workflow with years of context and trust built into the product. This is also why the language has started to shift from “wrapper” to “harness.” The market is already trying to distinguish between a thin interface and something more deeply embedded.
The next phase of the market will probably focus on categorizing thin refinement versus deep refinement. One is easier to copy because it is mostly a presentation layer. The second kind is harder to copy because it is intertwined with the base infrastructure and processes.
China already shows what happens when crude oil gets cheap
In the U.S., the value chain still appears to have two visibly distinct profit models. There is one at the model layer because frontier labs still command attention, capital, and, in many cases, a real commercial premium. And then there is one at the application layer, where companies try to build products on top of that intelligence, whether it is in the form of a wrapper or a harness.
In China, that picture already looks slightly different. Strong open-weight models have become widely available, and the economics of access have moved lower much faster. That makes it harder to build the entire investment case around proprietary intelligence alone - models are the most competitive if anything. This does not mean the model layer disappears, nor that model quality ceases to matter. It means the center of gravity shifts. More of the competition, more of the product differentiation, and more of the monetization pressure have moved downstream.
My point is that China is not just another market. It may be an early preview of what happens when model access becomes cheaper, more abundant, and less differentiating. The logic of competition becomes easier to see there because the layers have already compressed. To borrow from my earlier writing on LLM business models, I once argued that the dish is only as good as the quality of the fish. That may still be true for sashimi, maybe ceviche too. But here I have to counter my own analogy a bit: sometimes the quality of the cooking matters more than the fish’s rarity.
Because this in turn forces companies to differentiate elsewhere: distribution, product design, speed of iteration, vertical depth, ecosystem leverage, and the ability to build a product that feels native to how users actually behave. It forces them, in other words, to refine better. And we’ve written much about this- see AI Proem’s previous coverage.
And one thing that strikes me, each time I speak to investors in the Bay Area, is how poorly understood this still is. Not because people are unintelligent. Far from it. But because too much of the U.S. discourse still frames Chinese AI through a geopolitical or ideological lens rather than considering market structure and technological development. So you get questions about whether open-source adoption in China is mainly state-led, or whether local edge deployment is mostly driven by paranoia about cloud privacy. Those questions themselves show how far the framing can be from the actual reality on the ground. China started embracing open-weight AI for the same reasons markets adopt cheaper, workable inputs everywhere: it is effective, it reduces costs, and fierce downstream competition forces product companies to move fast. As we have written repeatedly here at AI Proem, it started as an economic one.
Why it matters for global AI
So sharing three observations from the Chinese market that I feel are worth internalizing for investors:
First, commoditized crude does not kill the refinery business. If anything, it can expand it. Lower input costs and lower barriers to entry create more experimentation, more verticalization, and more attempts to package intelligence into different end uses.
Second, the winning refineries increasingly differentiate on distribution, product sense, and contextual fit rather than on raw model intelligence. That is not some uniquely Chinese curiosity. It is a likely preview of what happens in any market once the question “who has the smartest model?” becomes less decisive than the question “who has the most economically useful product?”
Third, spreads become thinner. Intense competition compresses the gap between token input cost and final selling price. That means Chinese refineries often have to run leaner, move faster, and iterate with more urgency. If open-source continues closing the gap globally, more of the AI world may converge toward that structure. The standalone profit pool at the model layer may not disappear, but it does get pressured. And once that happens, the central investment question shifts quite dramatically. One stops asking primarily who owns the best model and starts asking who owns the workflow in which the model becomes economically indispensable.
Some (especially public) investors I spoke to in San Francisco were not unaware of this risk. But many of them felt oddly complacent about it, as if the direction of travel was obvious but somehow still dismissible. The defenses all sounded familiar: the frontier labs will maintain a lead; the market is still so early that commoditization does not matter yet; if the major labs eventually monetize at a huge scale, who cares what the end state looks like; the U.S. government will ultimately step in and slow distillation. Perhaps some of those things will prove partly true?
In commodity markets, analysts obsess over the marginal producer because that is what sets the price. Something similar is happening here. China deserves close study, not as an exotic side case, but because it may be showing what happens when the marginal cost of intelligence falls faster than the narrative can comfortably absorb.
Connecting Thompson and Evans
This also connects, at least loosely, to frameworks Ben Thompson and Benedict Evans (my OGs) have been writing about recently. Seen this way, the current debate among AI commentators looks less contradictory than incomplete.
In Stratechery, Ben Thompson has been right to emphasize that the model makers may be better positioned than many expected because they increasingly own first-party products, not just raw model access. If the labs can package intelligence directly into useful products, then more value can flow back to them than the “all labs become commodity utilities” camp assumes. That is directionally consistent with his recent writing on software and AI, even if the exact end state remains open.
The framework captures why first-party refinement matters, but not fully why third-party refinement can still become enormously valuable. The history of technology stacks suggests that foundational platforms often push downward and upward at the same time, yet still leave large spaces in which third-party companies create durable value by owning particular workflows, categories, and customer relationships. The existence of a powerful platform does not eliminate the need for downstream specialists; it merely raises the bar for which specialists/verticals get to survive.
Benedict Evans, by contrast, has made one of the cleanest bear cases on OpenAI specifically. His argument is that labs like OpenAI do not clearly have a unique technology or product moat, that user engagement appears shallower and more fragile than current perception suggests, and that incumbents have broadly matched capabilities while bringing stronger product and distribution advantages to the fight. And we’ve kind of seen that disruption happen so quickly with Claude.
This relates to my point that the internal-lab-refinery may not have enough context, workflow depth, or defensibility to justify the strategic centrality people have projected onto it.
Thus, putting the two analyses together, I was inspired. Thompson is right that the model-maker advantage is real, especially in first-party refinement. Evans is right that this does not automatically create an impregnable moat at the product level. The synthesis, then, is that model advantage matters, but only up to a point. Beyond that point, what determines value capture is whether the product in question has sufficient workflow, context, and distribution to sustain the spread. So it goes back to my point about third-party refineries.
The frontier labs remain the hardest call. They are clearly extraordinary businesses in many respects, and their pace of productization has already surpassed many skeptics' expectations. But the question at very large valuations is not whether they can continue on this impressive ARR growth trajectory.
Can the economics they enjoy today remain durable as model capabilities diffuse further and downstream competition matures? To underwrite those valuations, one has to believe not only that crude production remains meaningfully profitable, but that first-party refinement becomes sufficiently central to capture a large and lasting share of the downstream economics. Perhaps that happens. Perhaps the strongest labs evolve into more like full-stack platforms, fostering ecosystems atop their intelligence while also capturing direct end-user demand. That would likely be the most durable version of the story. But API access alone will likely be insufficient, and first-party chat products may be insufficient as well if the rest of the market becomes more efficient at refinement than current enthusiasm assumes.
One broader thought underlies all of this: the pattern of upstream capability commoditizing while downstream refinement captures value is not unique to AI. One sees versions of it across many waves of technology. In the cloud, raw infrastructure became foundational, but much of the most durable value was captured by the software companies that turned that infrastructure into category-defining systems of work. In mobile, the platforms and devices mattered enormously, but so did the companies that transformed mobile distribution into new end markets and behaviors.
In AI, tokens are the crude. But the more enduring question is the same as it has always been in these technology shifts: where does the markup actually live once the raw input becomes cheaper, and what keeps that markup from being competed away?
So, let me ask you again, ‘who’s winning?’ or ‘who will win?’






My goat, I’m a SWE trying to position myself in the AI market and your thoughts are so helpful
So many valuable insights here.
Whether these same observations can be extended to national security applications, and how, will be a big open question for the next couple of years. Balancing speed of adoption against growing risks from increasingly advanced and agentic AI will become a major challenge for all governments.