When Active Learning Meets Implicit Semantic Data Augmentation

Zhuangzhuang Chen, Jin Zhang, Pan Wang, Jie Chen, Jianqiang Li ;


"Active learning (AL) is a label-efficient technique for training deep models when only a limited labeled set is available and the manual annotation is expensive. Implicit semantic data augmentation (ISDA) effectively extends the limited amount of labeled samples and increases the diversity of labeled sets without introducing a noticeable extra computational cost. The scarcity of labeled instances and the huge annotation cost of unlabelled samples encourage us to ponder on the combination of AL and ISDA. A nature direction is a pipelined integration, which selects the unlabeled samples via acquisition function in AL for labeling and generates virtual samples by changing the selected samples to semantic transformation directions within ISDA. However, this pipelined combination would not guarantee the diversity of virtual samples. This paper proposes diversity-aware semantic transformation active learning, or DAST-AL framework, that looks ahead the effect of ISDA in the selection of unlabeled samples. Specifically, DAST-AL exploits expected partial model change maximization (EPMCM) to consider selected samples’ potential contribution of the diversity to the labeled set by leveraging the semantic transformation within ISDA when selecting the unlabeled samples. After that, DAST-AL can confidently and efficiently augment the labeled set by implicitly generating more diverse samples. The empirical results on both image classification and semantic segmentation tasks show that the proposed DAST-AL can slightly outperform the state-of-the-art AL approaches. Under the same condition, the proposed method takes less than 3 minutes for the first cycle of active labeling while the existing agreement discrepancy selection incurs more than 40 minutes."

Related Material

[pdf] [DOI]