Hierarchical Feature Embedding for Visual Tracking
"Features extracted by existing tracking methods may contain instance- and category-level information. However, it usually occurs that either instance- or category-level information uncontrollably dominates the feature embeddings depending on the training data distribution, since the two types of information are not explicitly modeled. A more favorable way is to produce features that emphasize both types of information in visual tracking. To achieve this, we propose a hierarchical feature embedding model which separately learns the instance and category information, and progressively embeds them. We develop the instance-aware and category-aware modules that collaborate from different semantic levels to produce discriminative and robust feature embeddings. The instance-aware module concentrates on the instance level in which the inter-video contrastive learning mechanism is adopted to facilitate inter-instance separability and intra-instance compactness. However, it is challenging to force the intra-instance compactness by using instance-level information alone because of the prevailing appearance changes of the instance in visual tracking. To tackle this problem, the category-aware module is employed to summarize high-level category information which remains robust despite instance-level appearance changes. As such, intra-instance compactness can be effectively improved by jointly leveraging the instance- and category-aware modules. Experimental results on various tracking benchmarks demonstrate that the proposed method performs favorably against the state-of-the-arts."