Imbalanced data distribution is a practical and common challenge in building machine learning (ML) models in industry, where data usually exhibits long-tail distributions. For instance, in virtual AI Assistants, such as Google Assistant, Amazon Alexa and Apple Siri, the play music or set timer utterance is exposed to an order of magnitude more traffic than other skills. This can easily cause trained models