The Rapid Evolution Of Robotics

A few days ago, I attended one of the first non-academic robotics conferences, Actuate, in San Francisco. One of the most fascinating insights was the early evidence suggesting that robots trained using a variety of data (cross-embodiment training) can outperform specialized robots at their own tasks. For example, a robot trained on diverse tasks—such as walking, assembling, and manufacturing—could perform better at picking objects than a robot trained exclusively for that task.

Robots in our life (image generated by DALL-E)

Following are my key observations from the conference: 

1. Physical Intelligence, a silicon valley startup, raised $70M in seed round to build a general robotics model which can be used by different kinds of robots to do various tasks. 

2. This idea of general model for robotics is called Robotics Foundation Model (RFM). The concept is pretty similar to Large Language Model (LLM) which are the basis for Generative Artificial Intelligence (AGI). 

3. There is debate in the industry about which approach is better - build vertically integrated specialized robots or build RFM which can work for any task. 

4. The biggest problem in the building robots today is lack of data. It is relatively easy to train the LLMs with the entire Internet but no such thing exists for robots. It took WaymoGoogle owned self-driving startup, about 20 years to get enough data for the self-driving cars to be safe. 

5. Vision Language Models (VLMs) which combine vision and language models are driving progress in robotics. The techniques used to train a VLM for robotics control is called Vision-Language-Action (VLA). These "trained for robotics" VLMs are called VLA models. One of popular VLA example is the RT-2 model. The VLAs enable us to talk to robots in English and break the command in small steps. For example, you can just say, "clean dishes" and the VLA will break it into "grab the dishwasher handle, open door, take the clean dishes out, etc." 

6. One of the main challenges with robotics is that you can get to 70-80% accuracy pretty fast but getting to 99.999% accuracy can take years and as there is not enough data on the corner cases (20-30%) so you have to work on generating the data. This happens when you are using supervised learning

7. A lot of founders in the robotics field are former employees of self-driving car companies. 

8. $1T is spent every year for the physical infrastructure upkeep. Robots can help. 

9. Autonomous fine tuning for RFMs is topic of great interest to the industry. 

10. ROS2 is the most popular open-source robotics programming language.

11. Following minimum infrastructure is needed to create robots: 

i) Performance and accuracy of robot components
ii) Teleoperations and data annotations 
iii) Validation and analytics 
iv) Observability and monitoring 
v) MLOps ELT 
vii) Business analytics 

12. You can't iterate in robotics because of huge advancements in hardware every 18 months. 

13. Robots have a lot of sensors and calibrating these sensors for system (robot) accuracy is time consuming. Hence, robot self calibration or auto calibration is a hot topic. 

14. Calibration requires understanding of the following four items: 

i) Product requirements 
ii) Operating environment 
iii) Engineering controls 
iv) Process legibility 

15. Robots have multiple systems/computers which need to work in sync and when clocks on these computers drift at different rates, things can get out of sync and make the robot ineffective. Furthermore, when robots have to work with other robots and if they all of them are not on the same clock, it can cause accidents. Logging timing information consistently can help catch the problem. 

16. Skydio, a drone startup, has solved the night vision problem in drones with dual architecture for day and night vision. 

17. Robotics research mainly happens with simulation which can cover majority of the common use cases. However, simulation does not reflect real world physics and it does not cover corner cases (long tail) which are necessary for produtizing the research. Hence, real world testing is necessary for commercializing robotics. Ideally, there should be a data pipeline from real world testing into the simulation. 

18. Saronic is building autonomous boats for the US Navy. To make the autonomous boats work they have to harmonize four systems: 

i) Perception 
ii) Control 
iii) Locomotion 
iv) Networking 

19. The progress Wayve, an autonomous driving startup, has shown for autonomous driving unsupervised learning in a short period of time is remarkable. Their approach is quite different than Waymo, which collected and labeled data for years using supervised learning. LLM4Drive is a thing. 

20. Embodied Intelligence might be necessary for machines to be generally intelligent. 

21. Amazon is the biggest user of robots today with over 750,000 robots deployed in warehouses and fulfillment centers. 


Robotics is a rapidly evolving field, offering diverse approaches to overcoming complex technical challenges. As someone who has built hardware, software, AI, and networking products, it's exciting to witness these four disciplines converge into a single, integrated solution. This convergence is not just exciting—it represents the future of innovation, where the interplay between these technologies will shape the next generation of intelligent machines and can transform entire industries. 

Popular posts from this blog

Obituary: Charles T. Munger

Marry the best who will have you and other wisdom from Munger and Buffett

Systems Thinking as taught by Ackoff