Unsupervised testing of NLU models with multiple views

By Radhika Arava, Matthew Trager, Boya Yu, Mohammad AbdelHady
2021
Download Copy BibTeX
Copy BibTeX
In Natural Language Understanding (NLU) systems in voice assistants, new domains are added on a regular basis. This poses the practical problem of evaluating the performance of NLU models on domains where no manually annotated data is available. In this paper, we present an unsupervised testing method that we call Cross-View Testing (CVT) for ranking multiple intent classification models using only unlabeled test data. The approach relies on a number of labeling functions to automatically annotate test data in the target domain. The labeling functions include intent classification models trained on different domains, as well as heuristic rules. Specifically, we combine the annotations of multiple models with different output spaces by training a combiner model on synthetic data. In our experiments, the proposed model outperforms the target models by very large margins, and its predictions can be used as a proxy of ground truth for unsupervised model evaluation.
Research areas

Latest news

GB, MLN, Edinburgh
We’re looking for a Machine Learning Scientist in the Personalization team for our Edinburgh office experienced in generative AI and large models. You will be responsible for developing and disseminating customer-facing personalized recommendation models. This is a hands-on role with global impact working with a team of world-class engineers and scientists across the Edinburgh offices and wider organization. You will lead the design of machine learning models that scale to very large quantities of data, and serve high-scale low-latency recommendations to all customers worldwide. You will embody scientific rigor, designing and executing experiments to demonstrate the technical efficacy and business value of your methods. You will work alongside aRead more