The Physical AI Revolution Part I: The Long Road to Humanoid Robotics
Five Transformative Phases from Early Dreams to the Dawn of the Robotics Industry
Introduction: Standing at the Threshold
In December 2023, amidst the bustling halls of NeurIPS, I found myself surrounded by hundreds of roboticists who felt like outsiders in the generative AI revolution. The irony of that moment is striking now, as we witness an unprecedented surge of excitement around humanoids and physical AI. But less than a year ago, robotics was largely considered a relic of the autonomous vehicle era, disconnected from the AI boom that ChatGPT had ignited.
The prevailing wisdom was clear: the world wasn't ready for robots. Tesla and Figure's humanoid projects remained mysterious, and the collapse of the autonomous vehicle industry had left a bitter taste in investors' mouths. The venture capital world had diverted away from robotics in favor of other narratives that looked more lucrative and promised faster returns. While language models were capturing headlines and capital, robotics was quietly retreating into the shadows.
To many, robotics represented an impossibly difficult challenge – there were mountains of hardware and software problems to overcome, distribution posed a formidable barrier, and even successful applications remained confined to specially designed factory floors. No one in history had successfully launched a robot smart and robust enough to do one human unit of work. The robotics industry didn't really exist, and it still is yet to be born.
Yet, in a twist that few could have predicted, we now stand at the threshold of something far more significant than just another technological revolution. For the first time in human history, we're approaching the creation of humanoid robots capable of serving as drop-in replacements for human labor – a development that promises to fundamentally alter the fabric of civilization itself. This isn't merely about technological advancement; it's about dismantling the age-old constraints of human physical labor that have shaped every aspect of our society.
This robotics boom will be unlike anything we've seen before. Within a year or two, we will see actual robots in the market and witness the birth of the first true robotics industry. The shockwave the emerging age of physical AI will bring to the world presents a once-in-a-lifetime opportunity for a few and an existential threat to others. This seismic shift will not be inclusive, nor will it promise equal opportunities for all. The question is more fundamental than whether you'll thrive in this emerging market – it's whether you'll survive the great reset and get to be one of the few who see the new world at the end of this long tunnel.
Contents
This article is the first in a series of five dedicated to covering various aspects of the upcoming humanoid/robotics industry. Throughout this piece, we'll take a deep dive into why and how robotics is making its way to the forefront of history. We'll journey through five distinct phases that have shaped the field: from the early dreams of the 1970s, through the cycles of excitement and disappointment with Boston Dynamics and Google, to Tesla's ambitious reboot in 2021, the transformative impact of generative AI, and finally, the acceleration we're witnessing in 2024. By understanding this evolution, we'll see why this moment is fundamentally different from all previous attempts at building human-like machines.
In subsequent articles, we'll explore the present state of physical AI, analyze potential industry evolution scenarios, examine the profound implications for human civilization, and investigate the connection between physical AI and artificial general intelligence. But first, let's understand how we arrived at this pivotal moment in history.
1. The Long Dream of Humanoid Robots (Pre-2013)
Humanity’s fascination with creating machines that mirror our own form is hardly new, but it wasn't until the 1970s that this dream began its transformation from fantasy to scientific pursuit. Among the early pioneers, Japan emerged as the undisputed leader, with its corporate giants viewing humanoid robotics not just as a technical challenge, but as a strategic imperative for the nation's future.
Video: Honda’s P2 robot demonstration
Honda made the most serious and sustained commitment to this vision. Their journey began with the P-series robots in 1995, culminating in ASIMO (Advanced Step in Innovative Mobility) in 2000. Standing at 130cm tall, ASIMO was the most advanced humanoid of its time, designed to navigate human environments, carry objects, and climb stairs. Its development marked the first serious attempt at creating bipedal machines that could reliably walk on two legs and interact with humans.
However, the technical challenges proved far more formidable than anticipated. Despite continuous improvements over nearly two decades, ASIMO and its contemporaries struggled with fundamental issues. The robots' movements remained rigid and unnatural, lacking the fluid adaptability of human motion. While ASIMO could walk up prepared stairs in controlled demonstrations, it struggled with unpredictable real-world scenarios that humans navigate effortlessly.
Video: ASIMO fails to climb the stairs
The limitations weren't just mechanical. The control systems of the time, based primarily on classical robotics approaches, couldn't deliver the kind of adaptive intelligence needed for truly autonomous operation. Each new capability required extensive programming and testing, and the robots remained far from the dream of general-purpose humanoid assistants that could seamlessly integrate into human environments.
These persistent challenges led Honda to cease ASIMO's development in 2018, finally retiring the project in 2022. This decision marked the end of an era, demonstrating that building truly capable humanoid robots would require breakthrough advances in both hardware and software. After Honda's efforts gradually lost momentum, the field of humanoid robotics saw little significant progress until 2013, when Google made the first serious attempt to tackle this challenge through its acquisition of Boston Dynamics.
2. The Dark Age: Google, Boston Dynamics, and SoftBank (2013-2021)
2.1. Google's Brief Robotics Dream
Fast-forward to 2013, when Boston Dynamics made a breakthrough in dynamic mobility that caught the tech world's attention. The company's robots, particularly Atlas, demonstrated unprecedented abilities in balance and movement. Google, under Andy Rubin's guidance, saw the potential and acquired Boston Dynamics, igniting a wave of excitement throughout the engineering community. For a moment, it seemed the age of robots was finally within reach.
Video: Boston Dynamics Atlas in 2013
However, this optimism proved short-lived. Google's engineers discovered that achieving human-like robotics required solving multiple complex problems simultaneously. While autonomous navigation was itself a formidable challenge, humanoid robots needed to master something far more difficult: coordinating intelligent actions while maintaining dynamic balance and manipulating objects. The fundamental complexity of combining mobility with manipulation proved to be a far greater challenge than most had anticipated.
Following Rubin's departure in 2014, Google made a strategic compromise. Rather than pursuing the more complex challenge of humanoid robotics, they chose to focus on the more tractable problem of autonomous vehicles, where the challenge was primarily navigation without the added complexity of dynamic manipulation. This decision in 2015 sparked the autonomous vehicle boom, as other tech giants followed Google's lead. By 2017, Google had sold Boston Dynamics to SoftBank, completely divesting its robotics IP – a clear signal that they no longer believed in the near-term viability of humanoid technology.
2.2. SoftBank's Determined Yet Unsuccessful Pursuit
SoftBank had its own ambitious vision for humanoid robotics. In 2012, they acquired French company Aldebaran Robotics, which led to the creation of Pepper in 2014. While 27,000 units shipped, Pepper was designed primarily as an emotional support robot, lacking the physical capabilities needed for practical applications.
Video: SoftBank’s Pepper (2014)
Undeterred, SoftBank doubled down on humanoids. Their 2017 acquisition of Boston Dynamics was accompanied by the purchase of Schaft, a Japanese bipedal robotics company. Yet, despite controlling some of the most advanced technology of the time, SoftBank found themselves facing the same fundamental challenges that frustrated Google. In 2021, they abandoned their humanoid projects in favor of industrial automation solutions, selling 80% of Boston Dynamics to Hyundai.



These high-profile setbacks revealed fundamental challenges that couldn't be solved by deep pockets alone. The technological barriers were immense: bipedal robots faced far more complex mobility challenges than their wheeled counterparts, and achieving human-level dexterity required advances in both hardware and artificial intelligence that simply didn't exist. With easier alternatives like autonomous vehicles and SaaS platforms drawing investment away, there wasn't enough conviction to pursue the multiple technological breakthroughs needed to bring humanoids to market. The field remained stagnant until Tesla's announcement of the Optimus project would begin to transform the entire narrative around robotics.
3. The Great Reboot: Tesla Optimus and the Humanoid Renaissance (2021)
3.1. A New Wave Begins
In August 2021, Elon Musk made an announcement that would fundamentally shift the narrative around humanoid robotics: Tesla would build Optimus, a general-purpose humanoid robot. Unlike previous corporate forays into robotics, Tesla's approach wasn't about research or demonstration – it was about building a commercial product that could transform manufacturing. Musk's vision was clear: these robots would eventually perform dangerous, repetitive, and boring tasks that humans shouldn't have to do, starting with Tesla's own factories.
Video: Elon Musk announces Tesla Optimus (2021)
The momentum was immediate and widespread. Agility Robotics, a spinoff from Oregon State University, secured a $150 million Series B extension in April 2022 – a dramatic jump from their previous rounds of $800,000 seed and $8 million Series A. Notably, the Series A had been backed by Andy Rubin's Playground Global, suggesting the former Google robotics chief hadn't lost faith in humanoids despite the Boston Dynamics experience. Figure AI, founded in 2022 by serial entrepreneur Brett Adcock, raised $70 million in Series A funding, positioning itself as a serious contender in the humanoid space with a focus on manufacturing applications.
Chinese companies moved even more aggressively. Xiaomi established a dedicated robotics division and unveiled their humanoid CyberOne in August 2022. AgiBot, founded in February 2023, raised four consecutive Series A rounds, approaching a $1 billion valuation by early 2024. UniTree pivoted from quadruped robots to humanoids, securing a $139 million investment led by tech giant Meituan. Another notable entrant was Engine AI, founded by a co-founder of XPENG Robotics, which raised 100 million yuan in angel funding alone.
Video: Xiaomi's CyberOne unveiling (2022)
3.2. Why This Time Was Different
This surge in investment and activity wasn't just another cycle of robotics hype. Two critical factors set this wave of robotics development apart from all previous attempts.
3.2.1. From Hobbyists to Entrepreneurs
The robotics community underwent a profound cultural shift. For decades, the field had been dominated by hobbyists who approached robotics as a vocational challenge rather than a business opportunity. These enthusiasts often worked in isolation, drifting from one technical problem to another without serious consideration for commercialization. The best engineering talent rarely worked in robotics, and even successful projects remained confined to research labs or personal workshops.
The new wave brought entrepreneurs who understood the importance of building viable products and strong brands. They attracted world-class AI and hardware talent, focusing not just on capabilities but on aesthetics, use cases, and distribution channels. The contrast was striking: while previous humanoid projects often looked like engineering experiments, Tesla and Figure's robots were designed with the sleekness and appeal of futuristic technology products. More importantly, their public relations weren't just about raising capital – they were strategic messages to establishments worldwide that the age of robots was approaching, warning that the very fabric of civilization that brought them generations of power was about to transform.
Video: Figure AI's first public demo of their humanoid robot (2024)
This leadership shift proved crucial. When entrepreneurs like Elon Musk entered the field, they brought a sense of urgency and purpose that had been missing. Capital moves quickly when desperate players see a clear path forward, and when this desperation synchronizes across multiple sectors, it creates unstoppable momentum. The robotics community, once dominated by isolated technical pursuits, was now attracting the world's leading AI researchers and hardware engineers, all focused on the systematic breakthrough needed to bring humanoid robots to market.
3.2.2. Geopolitical Alignment for Automation
Geopolitical factors had aligned to create urgent demand for automation. China was facing the consequences of its one-child policy, with a looming demographic cliff threatening its manufacturing dominance. Despite government efforts to boost birth rates, projections showed a severe labor shortage beginning around 2050. The Chinese government, recognizing that automation was the only path to maintaining industrial capability, had invested heavily in AI and robotics.
China's technocrats understood that the only way to prevent their supply chains from developing dependencies on other countries was through automated manufacturing. They had continued investing in robotics even while US tech companies diverted away from the field. Tesla's bet on humanoids likely reinforced their conviction, as evidenced by the rapid rise of Chinese humanoid startups thereafter. This was particularly significant given China's demonstrated ability to rapidly scale promising technologies.

Meanwhile, the retreat of globalization during the Trump administration (2017-2021) had pushed major powers to secure their manufacturing base. As China built closer ties with Russia and focused on supply chain independence, both the US and China saw automated factories as crucial to their industrial future. The world was moving toward a multipolar system where coalitions of regional powers would jointly control global resources. In this new reality, maintaining independent manufacturing capabilities became a strategic imperative, creating powerful incentives for both nations to pursue robotics aggressively.
3.3. The Path Ahead
As momentum built around humanoid robotics, a crucial question remained: was the technology finally ready?
Despite this unprecedented momentum and alignment of interests, significant technological barriers remained. Creating robots with human-like capabilities required solving complex challenges in both hardware and software. Actuators needed to be more sophisticated, control systems more advanced, and artificial intelligence more capable of handling unstructured environments. Many observers viewed robotics companies as "deep tech" ventures, analogous to quantum computing and nuclear fusion – fascinating but decades away from practical application.
When Tesla announced Optimus, even optimistic projections put functional humanoids arriving no earlier than 2030. But no one could have predicted how dramatically the timeline would accelerate when ChatGPT burst onto the scene, pulling the future of robotics forward by half a decade.
4. The ChatGPT Effect (2022-2023)
When ChatGPT was announced in late 2022, it created an unprecedented "oh-shit" moment that sent shockwaves across every sector of society. People had never seen an AI product so general and interactive – analogous to the Jarvis AI from Iron Man, and exactly what Alexa and Siri should have been. The impact was immediate: companies rushed to build around large language models, NVidia's stock soared as compute demand exploded, and artificial general intelligence (AGI) suddenly seemed within reach.
Yet amid this frenzy, robotics appeared to be an outlier. To many observers, the field seemed like a relic of the bygone autonomous vehicle era, with outdated technology and skills that weren't transferable to what mainstream AI companies required. Even OpenAI, which would become the face of the AI revolution, had disbanded its robotics department, Dactyl, in 2021 after struggling with dexterity issues. The message seemed clear: robotics was yesterday's moonshot.
Video: OpenAI's Dactyl project (2018)
Beneath the surface, however, a different story was unfolding. The generative AI boom was quietly revolutionizing the fundamental technologies needed for advanced robotics. The shift from small, specialized models to large foundation models marked a crucial paradigm shift. Instead of arduously tuning parameters on limited unimodal models, researchers could now pretrain massive models and fine-tune them for specific tasks, opening the door to deep reasoning capabilities that robots would need for real-world interactions.

The technical breakthroughs were numerous and significant. Efficient implementations of transformer components, like Together AI's Flash Attention, made large models more practical. Architectural enhancements and model optimization techniques enabled smaller, more efficient models without sacrificing capability. Perhaps most importantly, advances in image and video generation demonstrated AI's growing ability to understand and interact with the physical world.

These developments were precisely what robotics needed. Traditional approaches had struggled because they treated physical intelligence as a collection of separate problems: vision, planning, manipulation, and so on. The new wave of AI showed how a single foundation model could develop deep understanding across multiple modalities. While the tech world focused on language models generating poetry and code, the same technological breakthroughs were laying the groundwork for robots that could understand and interact with the physical world in fundamentally ways.
Moreover, the engineering practices developed during the generative AI boom proved invaluable for robotics. Teams learned to handle the complexity of large model training, develop robust evaluation methods, and create efficient deployment pipelines. These skills and tools would later become crucial for developing the sophisticated AI systems needed for advanced robotics.
The smartest roboticists, though a minority in their field, recognized the revolutionary potential of these AI breakthroughs. While many of their peers dismissed language models as irrelevant to robotics, these visionaries were quietly finding ways to adapt the new technologies for physical AI. They understood what others missed: that the challenge of creating machines with human-like dexterity and adaptability was becoming less insurmountable with each advance in AI.

By the end of 2023, it was becoming clear that the generative AI revolution hadn't left robotics behind – it had instead accelerated its development in ways few had anticipated. The same technologies that allowed ChatGPT to understand and generate human language were being adapted to help robots understand and interact with the physical world. This convergence set the stage for a dramatic acceleration in 2024, as the worlds of AI and robotics began to merge in unprecedented ways.
5. The Election Year and the Great Narrative Shift (2024)
By the end of 2023, a new reality was setting in: language models were running out of high-quality training data. The internet was becoming saturated with AI-generated content, which only added noise to the training pool. With billions invested in artificial intelligence and major tech companies heavily dependent on the AI narrative, finding successor technologies wasn't just an opportunity but an imperative for market stability.
The US government took decisive action to keep the market buoyant through the election year. The Treasury Department accelerated its issuance of short-term debt, offering yields above 5% for instruments that, unlike long-term bonds, didn't require complex risk management. Simultaneously, they pressured the Bank of Japan to maintain near-zero interest rates despite inflation. This stark interest rate differential created a powerful carry trade as investors borrowed cheaply in yen to buy dollar-denominated assets, effectively providing a government-backed funding mechanism for continued AI speculation.

Three adjacent narratives emerged to carry the torch forward, each building upon the foundation of large language models while expanding into territories that promised even greater impact.
5.1. Multimodality and the Rise of AI Reasoners
The first narrative centered on expanding AI's capabilities beyond text. OpenAI led the charge by integrating DALL-E into ChatGPT, demonstrating how models could seamlessly handle both text and images. The open-source community followed aggressively – Meta's Llama matched GPT-4's performance on most benchmarks while operating transparently, and Alibaba's Qwen demonstrated superior performance in Asian languages while handling multiple input types with remarkable efficiency.
OpenAI brilliantly tied this into the frame of "AI Reasoners" – systems that could process multiple types of data to develop deeper understanding of the world. By handling diverse inputs simultaneously, these models demonstrated increasingly sophisticated reasoning capabilities, from understanding spatial relationships to following complex visual instructions. The full productization of Google's Gemini and xAI's Grok further validated this direction, showing how multimodal processing could lead to more comprehensive AI understanding that would be crucial for robotics applications.

5.2. Video Generation and the Shadow of World Models
The second narrative exploded in February when OpenAI announced Sora. While the field had seen progress in diffusion models, NeRF, and gaussian splatting, Sora demonstrated unprecedented ability to maintain consistent objects, characters, and physics across extended video sequences. It demonstrated AI's growing ability to understand and predict physical world dynamics beyond simply generating content. Google quickly responded with Veo, later upgrading to Veo2 with interactive editing capabilities.
Video: OpenAI Sora announcement
Perhaps an icing on the cake on a PR standpoint, Stanford professor and former Google VP FeiFei Li launched WorldLabs, leveraging her work in computer vision to develop foundation world models for spatial computing. These simulators promised to revolutionize robotics development by providing sophisticated virtual environments where AI systems could learn complex physical interactions through millions of trials before deployment in the real world. This convergence of video generation and simulation technology marked a crucial step toward robust physical AI.
5.3. Physical AI and Robotics
The third narrative brought robotics back to center stage. NVidia's GTC conference in March saw the announcement of Project GR00T, which aimed to create a comprehensive software stack for humanoid robots, including pre-trained models for basic mobility and manipulation. This was followed by Kyle Vogt's ambitious move – fresh from scaling autonomous vehicle technology at Cruise, he launched a new venture called The Bot Company focused on bringing robotics into homes, starting with kitchen automation.
Video: NVidia's Project GR00T announcement
Tesla and Figure became increasingly vocal about their progress, moving beyond controlled demonstrations to showcase their robots performing actual manufacturing tasks like material handling and assembly. OpenAI made a strategic return to robotics through its partnership with Figure, later doubling down by reviving its robotics division under leadership recruited from Meta Reality Labs. Meanwhile, NVidia's announcement of Jetson Thor, featuring dedicated transformer engines and advanced AI accelerators, promised the computing infrastructure needed to run sophisticated robot control systems in real-time.
Video: Figure AI’s announcement of partnership with BMW
These three narratives worked in powerful synergy. Advances in multimodality improved robots' ability to understand their environment, while progress in video generation enhanced simulation capabilities which, when matured, can be used for training. The convergence suggested that physical AI wasn't just another moonshot – it was the logical next step in artificial intelligence's evolution.
By year's end, these technological threads had woven together to form the foundation of a new industry poised to transform manufacturing and human labor in the next 3-5 years. The financial engineering that sustained market confidence had provided crucial time for technology to advance, and now physical AI was ready to deliver on its long-standing promises. The robotics revolution had moved beyond speculation to become an imminent reality.
6. PREVIEW: Physical AI and The Dawn of the Robotics Industry
The developments of 2024 have set the stage for something unprecedented: the birth of the first true robotics industry in human history. In the next article of this series, we'll explore how ongoing technological and geopolitical changes are converging to create the conditions necessary for this fundamental transformation, as well as take a deep dive into the players who will become the initial citizens of the new industry.
6.1. Gearing Up for Transformation
While the robotics industry doesn't exist yet, the foundations are rapidly taking shape. Two fundamental factors signal readiness for the age of physical AI and robots.
6.1.1. Technological Readiness
Technological barriers are beginning to fall. AI models are evolving into large foundation models capable of sophisticated reasoning, while hardware shows promising signs of standardization with advances in sensors and actuators. Changes in engineering practices, from the growth of optimization techniques to the development of powerful simulators, are lowering the barriers to building the machines we've long dreamed of – multipurpose, collaborative, and intelligent robots capable of performing one human unit of work.
6.1.2. Geopolitical Alignment
Geopolitical factors have aligned to create urgent demand for automation. The retreat of globalism and return to multipolarity is pushing major powers to secure their manufacturing capabilities. In this new world order, where coalitions of regional powers jointly control global resources, the ability to maintain independent manufacturing has become a strategic imperative. Intelligent humanoid robots offer a path to redesigning global supply chains, driving both the US and China to double down on robotics development.
6.2. Profiles of Players
The early years of the robotics industry will be extremely turbulent. As the industry takes its initial shape, many players will either adapt or be swept away by the coming storm. With so much unpredictability in how fast the technology will evolve, who will own those technologies, and how geopolitical situations will align, we don’t know anything in detail except that it will be a very long tunnel.
The landscape extends far beyond startups building humanoid robots. Established manufacturers like ABB, KUKA, and FANUC, their channel partners, and end customers like factories will all be affected. Different companies are pursuing various strategies – some, like Tesla, are vertically integrated, while others specialize in specific components or capabilities. New players will emerge to claim their niches in the ecosystem, while many existing ones will need to fundamentally reimagine their roles.
How this industry evolves will depend heavily on the sequence of events and interactions between key players. The next article will take a deep dive into the profiles of these players and analyze potential paths forward as physical AI moves from promise to reality.
This transformation won't be gradual or inclusive. When robots with human-level capabilities begin automating manufacturing, the changes will be fundamental and swift. We stand at the dawn of an industry that will reshape not just manufacturing, but the very fabric of human civilization.
Conclusion: Beyond the Tipping Point
Throughout this article, we've traced the remarkable evolution of humanoid robotics from early research projects to the threshold of a true industry. The journey hasn't been linear – from Honda's pioneering yet limited ASIMO, through Google and SoftBank's ambitious but ultimately unsuccessful ventures, to the exponential growth that happened since Tesla's transformative announcement in 2021. While previous attempts struggled with fundamental technological barriers, today's convergence of AI capabilities, hardware advances, and geopolitical imperatives has created unprecedented momentum toward robots that can serve as drop-in replacements for humans.
What makes this moment different isn't just technological readiness. The field has undergone a profound cultural transformation, attracting world-class talent and serious entrepreneurs focused on building real products and businesses. Meanwhile, geopolitical shifts have created urgent demand for automation, particularly in manufacturing. When ChatGPT demonstrated the power of foundation models in 2022, it accelerated these trends, pulling the timeline for practical humanoid robots forward by years.
The narrative shifts of 2024 – from multimodal AI to video generation to physical AI – weren't isolated developments. They were threads weaving together into something more profound: the foundation for an industry that will fundamentally transform human civilization. As we'll explore in the next article, the convergence of technological capability and geopolitical necessity is finally giving birth to the long-awaited robotics industry. The implications of this transformation will reach far beyond manufacturing, reshaping the very fabric of human society.
We stand at a pivotal moment in history. The age of automation that generations have dreamed about is rapidly becoming reality. For the first time, we have the technology, the talent, and the geopolitical imperative to build robots that can truly emancipate humanity from physical labor. This transformation will trigger a fundamental reset of civilization itself. The automation of labor through intelligent robots won't just change how we work – it will reshape the very foundations of human society in ways more dramatic and consequential than anyone could have imagined. The future we've dreamed of is almost here, and it's arriving faster than we're prepared for.