Large-Scale Few-Shot Learning via Multi-Modal Knowledge Discovery
Large-scale few-shot learning aims at identifying hundreds of novel object categories where each category has only a few samples. It is a challenging problem since (1) the identifying process is susceptible to over-fitting with limited samples of an object, and (2) the sample imbalance between a base (known knowledge) category and a novel category is easy to bias the recognition results. To solve these problems, we propose a method based on multi-modal knowledge discovery. First, we use the visual knowledge to help the feature extractors focus on different visual parts. Second, we design a classifier to learn the distribution over all categories. In the second stage, we develop three schemes to minimize the prediction error and balance the training procedure: (1) Hard labels are used to provide precise supervision. (2) Semantic textual knowledge is utilized as weak supervision to find the potential relations between the novel and the base categories. (3) An imbalance control is presented from the data distribution to alleviate the recognition bias towards the base categories. We apply our method on three benchmark datasets, and it achieves state-of-the-art performances in all the experiments."