ECVA | European Computer Vision Association

Simultaneous Detection and Tracking with Motion Modelling for Multiple Object Tracking

ShiJie Sun, Naveed Akhtar, XiangYu Song, HuanSheng Song, Ajmal Mian , Mubarak Shah ;

Abstract

Deep learning based Multiple Object Tracking (MOT) currently relies on off-the-shelf detectors for tracking-by-detection. This results in deep models that are detector biased and evaluations that are detector influenced. To resolve this issue, we introduce Deep Motion Modeling Network (DMM-Net) that can estimate multiple objects' motion parameters to perform joint detection and association in an end-to-end manner. DMM-Net models object features over multiple frames and simultaneously infers object classes, visibility and their motion parameters. These outputs are readily used to update the tracklets for efficient MOT. DMM-Net achieves PR-MOTA score of 12.80 @ 120+ fps for the popular UA-DETRAC challenge - which is better performance and orders of magnitude faster. We also contribute a synthetic large-scale public dataset Omni-MOT for vehicle tracking that provides precise ground-truth annotations to eliminate the detector influence in MOT evaluation. This 14M+ frames dataset is extendable with our public script. We demonstrate the suitability of Omni-MOT for deep learning with DMM-Net, and also make the source code of our network public."

Related Material

[pdf]