HumanX

Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos

1HKUST, 2Shanghai AI Lab
* Equal contribution, † Corresponding Authors

Abstract

Enabling humanoid robots to perform agile and adaptive interactive tasks has long been a core challenge in robotics. Current approaches are bottlenecked by either the scarcity of realistic interaction data or the need for meticulous, task-specific reward engineering, which limits their scalability. To narrow this gap, we present HumanX, a full-stack framework that compiles human video into generalizable, real-world interaction skills for humanoids, without task-specific rewards. HumanX integrates two co‑designed components: XGen, a data generation pipeline that synthesizes diverse and physically plausible robot interaction data from video while supporting scalable data augmentation; and XMimic, a unified imitation learning framework that learns generalizable interaction skills. Evaluated across five distinct domains—basketball, football, badminton, cargo pickup, and reactive fighting—HumanX successfully acquires 10 different skills and transfers them zero‑shot to a physical Unitree G1 humanoid. The learned capabilities include complex maneuvers such as pump‑fake turnaround fadeaway jumpshots without any external perception, as well as interactive tasks like sustained human‑robot passing sequences over 10 consecutive cycles—learned from a single video demonstration. Our experiments show that HumanX achieves over 8× higher generalization success than prior methods, demonstrating a scalable and task‑agnostic pathway for learning versatile, real‑world robot interactive skills.

Autonomous Interaction Skills (with MoCap Sensing)

Autonomous Interaction Skills (No External Perception)