Edina is a conversational agent whose responses utilize data harvested from Amazon Mechanical Turk (AMT) through an innovative new technique we call self-dialogues. These are conversations in which a single AMT Worker plays both participants in a dialogue. Such dialogues are surprisingly natural, efficient to collect and reflective of relevant and/or trending topics. These self-dialogues provide training data for a generative neural network as well as a basis for soft rules used by a matching score component. We present methodology for combining rule-based, retrieval, and generative methods to effectively leverage our data. Our hybrid data-driven methodology thus addresses both coverage limitations of a strictly rule-based approach and the lack of guarantees of a strictly machine-learning approach.
Authors: Ben Krause, Marco Damonte*, Mihai Dobre*, Daniel Duma*, Federico Fancellu*†, Emmanuel Kahembwe*, Jianpeng Cheng, Joachim Fainberg*, Bonnie Webber‡
* equal contribution; † team leader; ‡ faculty advisor