DexMV: Imitation Learning for Dexterous Manipulation from Human Videos

Yuzhe Qin, Yueh-Hua Wu, Shaowei Liu, Hanwen Jiang, Ruihan Yang, Yang Fu, Xiaolong Wang ;


"While in computer vision we have made significant progress on understanding hand-object interactions, it is still very challenging for robots to perform complex dexterous manipulation. In this paper, we propose a new platform and pipeline, DexMV (Dexterous Manipulation from Videos), for imitation learning to bridge the gap between computer vision and robot learning. We design a platform with: (i) a simulation system for complex dexterous manipulation tasks with a multi-finger robot hand and (ii) a computer vision system to record large-scale demonstrations of a human hand conducting the same tasks. In the DexMV pipeline, we couple 3D hand and object pose estimation on the videos with hand motion retargeting algorithm, to extract the hand-object state trajectories. We compare multiple imitation learning and reinforcement learning (RL) algorithms on the manipulation tasks in the simulation. We show that the demonstrations can indeed improve robot learning by a large margin and solve the complex tasks which RL alone cannot solve."

Related Material

[pdf] [supplementary material] [DOI]