A Generalized & Robust Framework for Timestamp Supervision in Temporal Action Segmentation
"In temporal action segmentation, Timestamp supervision requires only a handful of labeled frames per video sequence. For unlabelled frames, Timestamp works rely on assigning hard labels and performance rapidly collapses under subtle violations of the annotation assumptions. We propose a novel Expectation-Maximization (EM) based approach which leverages label uncertainty of unlabelled frames and is robust enough to accommodate possible annotation errors. With accurate Timestamp annotations, our proposed method produces state-of-the-art results and even exceeds the fully-supervised setup in several metrics and datasets. When applied to timestamp annotations with missed action segments, we show that our method remains stable in terms of performance. To further test the robustness of our formulation, we introduce a new challenging annotation setup of SkipTag supervision. SkipTag is a relaxation on timestamps to allow for annotations of any fixed number of random frames in a video, making it more flexible than Timestamp supervision while remaining competitive."