PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction

1Peking University, 2IDEA, 3Tsinghua University, 4Carnegie Mellon University
Corresponding authors

PhysHOI enables a physical simulated humanoid to reproduce the Human-Object Interaction (HOI) skills from demonstrations, without task-specific rewards. The whole-body humanoid follows the SMPL-X kinematic tree and has a total of 51x3 DoF actuators.


Humans interact with objects all the time. Enabling a humanoid to learn human-object interaction (HOI) is a key step for future smart animation and intelligent robotics systems. However, recent progress in physics-based HOI requires carefully designed task-specific rewards, making the system unscalable and labor-intensive. This work focuses on dynamic HOI imitation: teaching humanoid dynamic interaction skills through imitating kinematic HOI demonstrations. It is quite challenging because of the complexity of the interaction between body parts and objects and the lack of dynamic HOI data. To handle the above issues, we present PhysHOI, the first physics-based whole-body HOI imitation approach without task-specific reward designs. Except for the kinematic HOI representations of humans and objects, we introduce the contact graph to model the contact relations between body parts and objects explicitly. A contact graph reward is also designed, which proved to be critical for precise HOI imitation. Based on the key designs, PhysHOI can imitate diverse HOI tasks simply yet effectively without prior knowledge. To make up for the lack of dynamic HOI scenarios in this area, we introduce the BallPlay dataset that contains eight whole-body basketball skills. We validate PhysHOI on diverse HOI tasks, including whole-body grasping and basketball skills.



Contact Graph

Kinematic imitation rewards can easily fall into local optima. To compensate for the lack of dynamic information in HOI data, we involve the contact graph. Given an HOI frame, the CG nodes consist of all the body parts and objects. Each edge is a binary contact label that denotes contact or not. To simplify calculations, we can aggregate multiple body parts into one node, forming an aggregate CG.

Contact Graph.

Overview of PhysHOI

Here is the overview of our method. Given a reference HOI data, we feed the policy with the current simulated HOI state and reference HOI state, the policy output action, and the next simulated HOI state is yielded through the physics simulator. We time the kinematic rewards with the CG reward and optimize the policy to maximize the expected return. Repeat this process until converges, our humanoid can reproduce the HOI skills in the reference data.

Overview of PhysHOI.

The BallPlay Dataset

To make up for the lack of dynamic HOI scenarios in this area, we introduce the BallPlay dataset that contains diverse whole-body basketball skills. Instead of using MoCap devices that are costly and hard to scale up, we apply a monocular annotation solution to estimate the high-quality human SMPL-X parameters and object translations from RGB videos.

The BallPlay dataset.

Ablation on CGR

Through an ablation study on CGR, we can clearly see that the CGR plays a critical role in avoiding kinematic local optimum. Without the CGR, the humanoid may fail to control the ball, or, incorrectly uses his body to control the ball.

Robust to Data Errors

Although there may be inaccuracies in the reference HOI data, our method is robust to data errors and can correct these errors through physical simulation and reinforcement learning.

Robust to Varying Ball Sizes

Despite training on a fixed ball radius (12cm), our method shows robustness to different ball sizes when performing inference.

Failure Cases

However, when the HOI data faces serious errors, our method may also fail. In this layup data, we can see that the ball is floating in the air, and the palm is facing down instead of toward the ball, which makes the training unable to converge.


  author    = {Wang, Yinhuai and Lin, Jing and Zeng, Ailing and Luo, Zhengyi and Zhang, Jian and Zhang, Lei},
  title     = {PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction},
  journal   = {arXiv preprint arXiv:2312.04393},
  year      = {2023},