RoBERTaIQ: An efficient framework for automatic interaction quality estimation of dialogue systems

2021
Download Copy BibTeX
Copy BibTeX
Automatically evaluating large scale dialogue systems’ response quality is a challenging task in dialogue research. Existing automated turn-level approaches train supervised models on Interaction Quality (IQ) labels or annotations provided by experts, which is costly and time-sensitive. Moreover, the small quantity of annotated data limits the trained model’s ability to generalize to the long tail and out of domain cases. In this paper, we propose a learning framework that improves the model’s generalizability by leveraging various unsupervised data sources available in large-scale conversational AI systems. We mainly rely on the following three techniques to improve the performance of dialogue evaluation models: First, we propose extending the RoBERTa model to encode multi-turn dialogues to capture the temporal differences between different turns. Second, we add two additional pretraining processes on top of enhanced multi-turn RoBERTa to take advantage of large quantity of existing historical dialogue data through self-supervised training. Third, we perform fine-tuning on IQ labels in a multi-task learning setup, leveraging domain-specific information from other tasks. We show that the above techniques significantly reduce annotated data requirements. We achieve the same F1 score on IQ prediction task as our baseline with only 5% of IQ training data and further beat the baseline by 5.4% absolute F1 score if we use all of the training data.
Research areas

Latest news

GB, MLN, Edinburgh
We’re looking for a Machine Learning Scientist in the Personalization team for our Edinburgh office experienced in generative AI and large models. You will be responsible for developing and disseminating customer-facing personalized recommendation models. This is a hands-on role with global impact working with a team of world-class engineers and scientists across the Edinburgh offices and wider organization. You will lead the design of machine learning models that scale to very large quantities of data, and serve high-scale low-latency recommendations to all customers worldwide. You will embody scientific rigor, designing and executing experiments to demonstrate the technical efficacy and business value of your methods. You will work alongside aRead more