A team of Alexa AI researchers in December won the best paper award at the NeurIPS 2021 workshop on Efficient Natural Language and Speech Processing (ENLSP).
Di Jin, an applied scientist, won the award along with Shuyang Gao, applied scientist; Seokhwan Kim, principal applied scientist; Yang Liu, principal applied scientist; and Dilek Hakkani-Tür, senior principal scientist, for their paper, “Towards Zero- and Few-shot Knowledge-seeking Turn Detection in Task-oriented Dialogue Systems”.
Presently, the authors note, task-oriented dialogue systems frequently “rely on pre-defined APIs to complete tasks and filter out any other requests beyond the APIs as out-of-domain cases.” The paper focuses on how to more efficiently process out-of-domain customer requests “by incorporating external domain knowledge from the web or any other sources.”
The issue, Jin explained, results primarily from the gap between training data and actual user requests.
“It's very hard to guarantee that all user queries or user input text are in the exact same distribution as the training data,” he said. “Our APIs are based on common user queries, so we needed to enhance the model to detect the out-of-domain data and route those user queries elsewhere to be addressed.”
The authors designed a model that more efficiently identifies and routes out-of-domain requests, and named their model REDE because it utilizes adaptive REpresentation learning and DEnsity estimation.
“The most typical way to handle this kind of issue is to train a binary classifier, e.g., a large-scale pre-trained language model like BERT,” Kim explained. “To achieve this, we need positive and negative instances and we can build a machine learning model to decide whether a given input can be addressed by the API or requires external knowledge. But because open domain conversational AI systems allow customers to ask anything, it is difficult to collect a sufficient number of out-of-domain instances to train a classifier.
“What we proposed is not to train the classifier based on some training dataset, but instead to adapt the existing representation,” Jin said. “We transform that representation so that the new representation has enough distinctive power between the two classes, the seen — the instances the current API can field — and the potentially unseen, or out-of-domain ones.”
The paper notes the REDE model outperformed binary classifiers, for both standard and out-of-domain requests, for both low-resource and high-resource settings. And in zero-shot and few-shot scenarios, the REDE model gains an even larger performance margin relative to traditional binary classifiers like BERT.
“The key takeaway is that this kind of simple transformation of the representation works very well and efficiently,” Kim said. “That will help us to develop even more robust conversational models with much smaller datasets and smaller models.”