"Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications"

Lingzhi Zhang, Shenghao Zhou, Simon Stent, Jianbo Shi ;


"Egocentric videos offer fine-grain information for high-fidelity modeling of human behaviors. Hands and interacting objects are one crucial aspect of understanding viewer’s behaviors and intentions. We provide a labeled dataset consisting of 11,235 egocentric images with per-pixel segmentation labels of hands and the interacting objects in diverse daily activities. Our dataset is the first to label detailed interacting hand-object contact boundaries. We introduce a context-aware compositional data augmentation technique to adapt to out-of-the-distribution YouTube egocentric video. We show that our robust hand-object segmentation model and dataset can serve as a foundation tool to boost or enable several downstream vision applications, such as: Hand state classification, video activity recognition, 3D mesh reconstruction of hand-object interaction, and Seeing through the hand with video inpainting in egocentric videos. All of our data and code will be released to the public."

Related Material

[pdf] [supplementary material] [DOI]