AI Arms Race Far From Over: Chips is Only Half the Game, Infrastructure is the Other
Energy is the main bottleneck for the US
For the more polished op-ed that was published in Fortune, see here. Below is the unedited (ramble) edition.
With all the big tech firms racing to ramp up mega-scale AI data centers, we can see that there’s still a long way to go for them as to make the data centers into reality, as it would involve securing mass land, obtaining relevant permits, hiring labor quickly and securing access to large amount of reliable power, and eventually securing enough chips and related data center equipment to build a large compute cluster.
Relative to China, the US is miles ahead in advanced chips. However, the US will require an incalculable amount of time to build up physical infrastructure, and likely at a higher cost due to more expensive land, labor, and power. In contrast, China has an advantage over all the other ingredients needed but the chips.
Meta’s Mark Zuckerberg said during the most recent earnings call that continued large infrastructure investments are necessary to support the AI demand of the future, while acknowledging concerns of excessive spending, Zuckerberg argues that the risk of underinvestment is far greater. Failing to invest adequately today could result in missing out on or lagging behind in what he considers "the most critical technology for the next decade and a half."
But what particularly caught my ears is Zuckerberg’s repeated mentions of the “long lead time for spinning up physical [data center] infrastructure”.
Power Shortage Will Become the Real Bottleneck
So, what is causing the long lead time in the US? Nvidia chips don’t seem to have a long lead time anymore. The answer is physical infrastructure like land and power. (@semianalysis has brought this up too)
But let’s zoom in on power, the lead time on power transformers ranges between 2 to 4 years, and the lead time on power generation gas turbines can be as long as 3 to 4 years, caused by an increase in demand and historical underinvestment in the supply chain. The challenges don’t just stop there. Even if a company can secure the power equipment needed, it still faces a lengthy process of grid interconnection. For instance, North Virginia has the highest concentration of data centers in the US today and based on Dominion Energy, data centers over 100 MW will now see an extra 1 to 3 years wait to connect on top of the 3 to 4 years wait today, driven by a surge in new interconnection requests.
Elon Musk @Bosch Connected World
In fact, Elon Musk called this out at the Bosch Connected World Conference a while back, saying that “My not-that-funny joke is that you need transformers to run transformers... Then, the next shortage will be electricity. They won't be able to find enough electricity to run all the chips. I think next year, you'll see they just can't find enough electricity to run all the chips.”
To make the matter more pressing, data centers are only getting bigger. Josh Teigen of North Dakota recently said that two companies are looking to develop AI data centers in North Dakota. The AI data center projects would start between 500 MW to 1 GW, but could scale up to 5-10 GW facilities eventually. For context, North Dakota has a total generation capacity of 9.4GW as of the summer of 2022. No wonder tech leaders like Zuckerberg and Musk are making repeated pleas for the country to ramp up infrastructure investment urgently.
And we’re seeing the scaling speed similar to electric grid 100 years ago wrote Eric Flaningam. It is because electricity is needed
Meanwhile China’s central command model will mean that ramping up physical infrastructure can be done swiftly and more importantly cheaply. In this instance, if China can make certain breakthroughs in chips in the coming years, maybe it can catch up afterall. In the end, it is not just a competition on GPUs, but also on energy, land and ability to call on resources all together.
China’s Command Economy
In comparison to the US, China’s state’s top-down orders can often drive efficiency in a way that cannot be met in the US. Since 2014, the official establishment of Guizhou as the National Big Data (Guizhou) Comprehensive Pilot Zone, the province has had ambitious goals to create a national-level cluster for the development of the big data industry. Nearly a decade later, in 2022 it was said that there are 23 key data centers in operation or under construction in the province, including eight ultra-large ones, making it one of the most concentrated regions of ultra-large data centers in the world.
[For a more geopolitically-focused argument, see my the Fortune article here]
The province hosts data for the likes of Huawei, JD.com, Alibaba, Tencent, and Foxconn, as well as leading international companies such as Apple, Microsoft, Hewlett-Packard and NIIT. Who is there to say another Guizhou cannot be established in a nearby province with ample land and little urbanization. Such state-encouraged efforts would likely be welcomed by local governments as it would mean job creation and tax revenues, stimulation of the traditionally impoverished regions, better utilization of the land that is usually not fit for agriculture nor much else anyway, and to not be overlooked, sought after attention and favoritism from up-top.
Chinese big tech companies have indicated their clear strategic needs for AI and have been assertive in plotting out the full ecosystem through their investments. A key representation is Alibaba. The ecommerce juggernaut that was once the unquestionable market leader, that is now facing a fierce impact from newcomers Pinduoduo and has seized this opportunity to heavily invest in big models and optimize its cloud computing business and leverage its existing infrastructure, including its data center clusters in Guizhou.
However, for land, power, power equipment, China has a huge advantage, and may already be on its way to out compete the US. Just look at how the country was able to ramp up massive data centers in Guizhou for cloud computing, likely at much faster speed and lower cost than its US peers could in the US.
And this thinking has been echoed by experts in the industry, in a note by
, highlighting China’s leading expert in AI Wang Jian, the Alibaba Cloud founder said at a recent conference that China's energy advantages for AI development compared to the US. Meanwhile Zhang Ping'an, Huawei Cloud CEO, emphasized China's strengths in bandwidth, network, and energy, suggesting China to leverage these for cloud-based AI computing solutions, emphasizing architectural innovation over reliance on advanced AI chips alone could be the solution to coming out on top.Source: World Economic Forum - China Takes the Lead on Renewables
A potential route I think we’ll see in China is what the country did with renewable energy. The command economy delivered and exceeded expectations in efficiently rallying the clean energy technologies and its supply chain and production. Only a decade ago, China had minimal renewable energy. Notably, in the early 2000s, China's leadership recognized labor costs, climate impact, and energy needs and were able to swiftly allocate resources and strategically invest in various renewable technologies and supply chains through successive five-year plans. The government rapidly pushed for renewables across the public and private sectors. Now, China adds >200 GW of new renewable energy capacity annually, while the US adds only 40 GW.
Thus, it seems like the AI arms race is far from being over yet. GPUs is only half the equation here and I believe the other critical half lies within the ability to build up infrastructure and generate power. Chips may be at the center of attention today, but the overlooked infrastructure piece could potentially be the major factor determining the longevity and sustainability of AI development.
Thank you for your support, if you enjoyed my content please subscribe to Proem.