As Alexa-enabled devices continue to expand into new countries, we propose an approach for quickly bootstrapping machine-learning models in new languages, with the aim of more efficiently bringing Alexa to new customers around the world. We describe our approach in a paper we’re presenting next week at the 16th Annual Conference of the North American Chapter of the Association of Computational Linguistics: Human Language Technologies (NAACL-HLT).
Building a natural-language-understanding (NLU) model from scratch requires gathering and annotating huge sets of training data, which is a significant time burden for both annotators and scientists, and it’s a procedure that doesn’t scale to new languages. An obvious solution is to try to leverage the large data sets that have been used to train NLU models in other languages. In this work, we use machine translation (MT) to translate existing data sources into a target language and then use the translated data to bootstrap an NLU system.
A common way to begin training an NLU model in a new language is to use a formal grammar, a set of syntactic and semantic rules that, combined with a lexicon of words tagged with semantic information, can generate an arbitrary number of syntactically and semantically valid sentences. Although less time-consuming than annotating huge data sets, this does require language specialists to build grammars that offer good coverage for the target application.
Once this first system reaches a certain performance threshold, it can be shared with beta users. Beta users’ queries will, of course, better represent those of real users than artificially generated data will. All existing data sources are then used to train the system until it reaches a new, higher performance threshold, at which point it is made generally available to customers. Once customers begin using the system, their interactions with it generate even more training data.
However, it can take a significant amount of time and annotation effort to get enough real training data to achieve the type of feature coverage that Alexa customers in new languages will expect.
Machine translation could be a useful tool for quickly extending NLU systems to new languages and providing coverage of all Alexa features available in already supported languages. In this paper, we use a large data set of English utterances to bootstrap a German NLU system.
In addition, we explore ways to automatically identify “good” translations, i.e., the ones that improve NLU performance. First, we investigate filtering based on MT quality, rating translations according to the probability scores generated by the MT model. Next, we investigate filtering based on semantic accuracy. To measure this, we take the machine-translated text, automatically translate it back into the original language, and then rerun the NLU system on the result. The translation is scored according to how well the new semantic tags line up with those of the original.
Lastly, we apply some language-specific post-processing to the translation output. Specifically, we use target catalogues to resample the translated data. For instance, we automatically substitute the names of German cities for those of American cities mentioned in the original utterances, to better simulate data from German users. In addition, we choose to leave certain types of words, such as song and artist names, untranslated. For example, if the original utterance was “Play music by Queen,” the system would not translate the artist name “Queen” to the German word “Königin”.
In experiments we report in the paper, systems trained on MT data performed much better than those trained on grammar-generated data, and they even outperformed a system trained on 10,000 hand-annotated German utterances. The applied filtering and post-processing techniques improved results still further.
Overall, the work shows that the use of MT can shrink the first long phase of grammar generation and in-house data gathering for a new language. In addition, MT makes it possible to offer customers more features more rapidly, as data for existing features in all supported languages can be translated immediately for new languages.