AI

How video helps build robot brains for physical AI

At a glance:

  • Anaxi Labs crowdsources human-scale videos to train robot brains for physical AI tasks.
  • The company focuses on egocentric videos from industrial and household scenarios to teach robots context-dependent actions.
  • Physical AI requires detailed annotations and failure-recovery cases beyond internet data sources.

The Rise of Physical AI

Robots are poised to become the next trillion-dollar tech opportunity, driven by advancements in artificial intelligence. This has sparked a competitive race among robotics companies to develop industrial and humanoid robots capable of assisting or replacing humans in various tasks. However, a critical challenge lies in equipping these robots with the ability to visually navigate and understand their physical environments.

Traditional AI, such as large language models (LLMs), benefits from vast internet data and robust infrastructure like chips. But physical AI, which involves training robots to interact with the real world, faces a unique hurdle: the lack of a pre-existing data infrastructure. Unlike LLMs, robot training data cannot be sourced from the internet alone, necessitating the creation of specialized datasets that capture real-world scenarios and interactions.

Anaxi Labs' Approach: Crowdsourced Video Data

Kate Shen, co-founder of Anaxi Labs, is pioneering a method to address the data scarcity in physical AI. Her startup, which originated at Carnegie Mellon University, is building a data pipeline by crowdsourcing and supplying videos of people performing tasks. These videos are then shared with robotics manufacturers to help train their robots. Shen’s approach emphasizes the importance of human-scale video data, arguing it more accurately reflects how robots should perform tasks in real-world conditions.

The company’s strategy involves two main data pipelines. The first targets industrial-dense regions, such as construction sites, logistics hubs, and factory floors, where diverse scenarios are naturally present. The second pipeline leverages a community model, enabling individuals worldwide to upload videos for training purposes. Anaxi Labs plans to launch a data collection and annotation app this summer to facilitate this process, aiming to scale the availability of high-quality training data.

Beyond YouTube: Why Egocentric Videos Matter

While some robotics companies rely on YouTube videos or simulations for training, Shen points out the limitations of these approaches. The sheer volume of data required for physical AI training far exceeds what is available on the internet, and it necessitates repeated physical interactions for each scenario—something YouTube cannot provide. Moreover, simulations often lack the unpredictability and complexity of real-world environments.

Shen notes a shift in the industry toward egocentric video data, which captures tasks from a human perspective. This approach provides a clearer roadmap for physical AI by showing robots how tasks are performed in context. By focusing on videos where the camera mimics human vision, such as seeing two hands sorting packages and scanning barcodes, Anaxi Labs ensures that robots learn nuanced, context-dependent actions that are critical for real-world deployment.

What Data is Being Collected?

Anaxi Labs collects videos that precisely match the tasks clients want their robots to perform. These are egocentric views, capturing actions like sorting packages with barcode scanning. The company covers approximately 20 general steps commonly seen in industrial settings, such as assembly, packaging, and quality control. Additionally, they are expanding into household scenarios, including kitchen cleaning and bedroom organization, to broaden the applicability of their training data.

Annotation is crucial for enabling robots to understand the videos. Initially, annotations included segmentation, captioning, and contact points. However, to help robots grasp the "how" and "why" behind actions, the company now employs a "chain of thought" format. For example, when a robot sees a slipper, the annotation might explain: "Identify the slipper, grip harder to secure it." This detailed reasoning helps robots learn not just the steps but also the underlying logic for handling unexpected situations.

Safety and Job Impact: The Broader Implications

Physical AI introduces unique challenges compared to digital AI, particularly regarding safety. Unlike early LLMs that could rely on internet data, physical AI must account for failure and recovery cases from the outset. Companies are now building these scenarios into their models, ensuring robots can respond appropriately when things go wrong, such as dropping an object or encountering an obstacle.

On the job market, Shen sees mostly upside at this stage. Many small robotics companies are thriving by addressing labor shortages in industries like manufacturing. Factories struggling to hire workers for dangerous or repetitive tasks are increasingly turning to robots. This trend not only alleviates labor shortages but also creates new opportunities in robotics development and maintenance.

Editorial SiliconFeed is an automated feed: facts are checked against sources; copy is normalized and lightly edited for readers.

FAQ

What makes Anaxi Labs' approach different from using YouTube videos for robot training?
Anaxi Labs focuses on egocentric videos captured from human perspectives, which provide more accurate context for tasks than YouTube data. They also collect specialized industrial and household scenarios, along with detailed annotations like 'chain of thought' reasoning, which helps robots understand not just actions but the underlying logic behind them.
What types of videos and annotations does Anaxi Labs collect?
They collect videos matching specific robot tasks, such as package sorting with barcode scanning in industrial settings, covering about 20 general steps. Annotations now include segmentation, captioning, contact points, and 'chain of thought' explanations like 'identify the slipper, grip harder' to teach robots how to handle scenarios. They also expand to household tasks like kitchen cleaning.
How does physical AI impact job markets differently than digital AI?
Unlike early digital AI which displaced some jobs, physical AI is currently addressing labor shortages in manufacturing and dangerous tasks. Robotics companies are thriving by partnering with factories struggling to hire workers, creating new opportunities in robotics development. Physical AI also prioritizes safety through built-in failure-recovery cases, reducing risks in real-world deployment.

More in the feed

Prepared by the editorial stack from public data and external sources.

Original article