Natural Language Understanding (NLU) systems such as chatbots or virtual assistants have seen a significant rise in popularity in recent times, thanks to availability of large volumes of user data. However, typical user data collected for training such models may suffer from sampling biases due to a variety of factors. In this paper, we study the impact of bias in the training data for intent classification task, a core component of NLU systems. We experiment with three kinds of data bias settings: (i) random down-sampling, (ii) class-dependent bias, and (iii) class-independent bias injection. For each setting, we report the loss in model performance and survey strategies to mitigate the loss from two families of methods: (i) semi-supervised learning (SSL), and (ii) synthetic data generation. Overall, we find that while both methods perform well with random down-sampling, synthetic data generation out-performs SSL when only biased training data is available.
Research areas