ECVA | European Computer Vision Association

LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities

Baoxiong Jia, Yixin Chen, Siyuan Huang, Yixin Zhu, Song-Chun Zhu ;

Abstract

The ability to understand and interpret human actions is a long-standing challenge and a critical indicator of perception in artificial intelligence. However, a few imperative components of daily human activities are largely missed in prior literature, including the goal-directed actions, concurrent multi-tasks, and collaborations among multi-agents. We introduce the LEMMA dataset to provide a single home to address these missing dimensions with carefully designed settings, wherein the numbers of tasks and agents vary to highlight different learning objectives. We densely annotate the atomic-actions with human-object interactions to provide ground-truth of the compositionality, scheduling, and assignment of daily activities. We further devise challenging compositional action recognition and action/task anticipation benchmarks with baseline models to measure the capability for compositional action understanding and temporal reasoning. We hope this effort inspires the vision community to look into goal-directed human activities and further study the task scheduling and assignment in real-world scenarios."

Related Material

[pdf]