Batch-mode active learning iteratively selects a batch of unlabeled samples for labelling to maximize model performance and reduce total runtime. To select the most informative and diverse batch, existing methods usually calculate the correlation between samples within a batch, leading to combinatorial optimization problems which are inefficient, complex, and limited to linear models for approximated solutions. In this paper, we propose NimbleLearn, a scalable deep imitation batch-mode active learning approach to address these drawbacks. NimbleLearn sequentially predicts an “ideal sample” by a deep policy network for each batch. Such ideal sample maximizes the model performance when combined with the labeled samples and the already-selected samples in the current batch. Unlike the existing batch-mode active learning methods which directly select one batch of samples from unlabeled ones, NimbleLearn reduces the dimension of the policy network output to the number of features (assuming the number of unlabeled samples is much greater than the number of features). In addition, NimbleLearn is a general framework and can be applied in both linear and nonlinear models. Experiments conducted on 4 public datasets show NimbleLearn can achieve similar or better performance as existing SOTA algorithms, while reducing the number of labeled samples and runtime by over 50%.
NimbleLearn: A scalable and fast batch-mode active learning approach
2021
Research areas