The Bottlenecks for Embodied AI Development & Talent Outflow
Despite Excitement, Embodied AI is NOT READY YET
Hi All,
We’ve covered embodied AI extensively recently, despite the confidence being garnered by the flashy robotic demos and talks by the likes of Unitree, Galbot, Tesla, and Nvidia. There are still real challenges for embodied AI to become a reality for any of us anytime soon.
The lack of physical data remains the biggest bottleneck for embodied AI models: Embodied AI models require diverse datasets to generalize to new, unseen tasks. With LLMs, text, image, and video data are mainly available on the internet, but for robotics models, companies and research labs are creating new 3D data. The issue is that the necessary sensor and actuation data do not exist on a scale that can train generalizable AI models yet. And this was echoed by UBtech’s CBO, Michael Tam, at the Beyond Expo.
As of where technical development is at now, LLMs can't go so far in the physical world yet. The robots' comprehension of language-directed instructions is quite limited. The nuanced movements required even to lift a pen, such as the angle, the force with which to hold it, and the temperature at which to touch the item, as well as the height at which to lift it, are too complex for robots to understand. There are over 50 precision actuators in a humanoid robot designed for full-body tasks, and that is the kind we see today, which still appears somewhat bulky. Humans have about 360 joints throughout our bodies. Now imagine how hard it is to require the robots to move “as swift as a coursing river, with all the force of a great Typhoon” (yes, I’m quoting Mulan).
3 Primary Ways to Train a Robot
So, how are companies overcoming these challenges or at least trying to improve the training of these robots? We’re seeing that there are three primary approaches taken by companies in training physical AI models, according to Morgan Stanley’s research.
Teleoperation: This is the most traditional method of training robots, where a human operator uses a remote control to guide the robot through a series of movements to complete a specific task. And while the robot is being manipulated, its range of sensors can capture the vision and physical data of the series of movements, and then use that data set to train its neural networks underlying the model for the robot. This will be repeated thousands of times with slight tweaks and modifications based on the surrounding environment and specific task, so finally, in time, the robot will be able to take command and complete the action in similar scenarios. The fact is, this type of training can be highly time-consuming and labor-intensive, but it is the most effective way to train physical data.
Simulation: This option has infinite possibilities, as simulations can be diverse in setups. Humanoid robots have a higher number of degrees of freedom (DoF). And thus, they can be much more “free-spirited” in how they complete tasks in more unpredictable environments. But this could be much more computationally intensive.
Real-World Videos: Often, videos of humans or robots performing tasks and actions are used as training data, primarily for training robots to operate on assembly lines. The data can be vast, as we have a lot of available material online, but this again does not solve the issue with how accurately the movements can be translated to physical data, since it's only visual.
In Reality
What most embodied AI companies are trying to do now is use a hybrid training version of the three mentioned above.
Companies like Nvidia are already offering synthetic data training for robots, which enables them to create simulations, generate physical data, apply data augmentation, and store that data for companies to train their robots on.
But as we keep going back to the point that humanoid robots are not just a hype, but then for any model to really reach the average household or find a consumer use case at scale, may still take 5-10 years, or even more.
And even tech optimists are holding reservations on this front. Here is an excerpt I gathered from the Beyond Expo in Macau last week, specifically regarding robotics by Joe Tsai, Chairman of Alibaba.
Tsai was the keynote speaker for the closing ceremony, and he emphasized that when artificial intelligence is infused into robots, existing robots will become smarter and even capable of thinking. Robots can move in a way driven by reasoning and thinking, rather than just executing tasks according to pre-written programs.
This technology is not limited to factory automation but has a much broader range of applications. For example, robots can sweep the floor, make coffee, and play musical instruments, all of which are truly amazing.
On the other hand, when it comes to the utility of robots, most intelligent robots in the world do not need to look like humans - which I also agree with, even though UBtech has said humanoid robots will be the best to integrate into this physical world that is designed for us.
He even joked that he might feel scared if his robot cleaner looked like a human and would prefer it to look like a vacuum cleaner, but be able to move intelligently around the room.
Currently, most use cases for robots or intelligent robots are applicable. However, when it comes to humanoid robots, the real consideration is for those scenarios where people want something that looks and feels human, and there are not that many such scenarios. In these cases, I would probably prefer to interact with a real person rather than a robot.
Although artificial intelligence can now understand knowledge and reason, there is still a need for more progress in spatial intelligence, which is a challenge that the robotics industry needs to overcome in order to move forward.
Some Comments on Talent and Globalization
A recent conversation with a Silicon Valley-based investor highlighted an interesting trend that is quietly unfolding. There are currently numerous restrictions on capital injection from the U.S. into the Chinese AI market. - that we know. HOWEVER, there are a handful of U.S. AI companies that are operating in stealth mode in China.
I was initially perplexed, as the AI market is quite saturated with domestic players that are often more cost-effective for users in China. Furthermore, considering the sensitivities, I didn't see an apparent business reason for U.S. AI companies to do so. But the investor pointed out a reason I overlooked, which is the talent pool.
For a U.S. company to set up operations in China means they can easily tap into the local talent pool without paying the crazy Silicon Valley compensations (supposedly USD 1.3 million+ TC for a junior AI developer, wowza) and MUCH more for more senior talent. Now, setting up shop in China means that these companies can access some of the top researchers, developers, and engineers in the world right now, and without disrupting the lifestyles of these talents, who likely don't want to move to the U.S.
What we can predict is that this trend may even be exacerbated by the new Trump push to rescind student visas for Chinese students in sensitive sectors or those deemed to have political ties. As more will be forced or choose to move their studies and research back to the Mainland, or likely, I think, Hong Kong and Singapore, where STEM institutions rank highly globally.
In reverse, Chinese AI and tech companies continue to seek opportunities to “go global,” with the U.S. market being the ultimate goal for many. And although SHEIN’s IPO plans shifted from New York to London and now to Hong Kong, there is still hope for many to access global investors.
For now, stay tuned. Expect a deep dive and a list of companies that are finding real-life use cases for robots in healthcare, agritech, and manufacturing next Tuesday at ~ 7:30 AM EST from our special contributor, .
Related analyses:
Rise of China's Robotics Industry: from Manufacturing Arms to Embodied AI
Physical AI’s ChatGPT “moment”: A Closer Look at Embodied AI Robots
Robot Girlfriends, Robot Firefighters, Robot Dogs, Robot Maids and Beyond
Rise of China's Robotics Industry: from Manufacturing Arms to Embodied AI
Introducing Unitree, China’s leading AI-embodied Robotics Company
Great post - would love to think more deeply about speed of VLAs and understanding inferencing Hz as well because that matters a lot more for robotics than for LLMs