-
AAAI 20202020We propose TANDA, an effective technique for fine-tuning pre-trained Transformer models for natural language tasks. Specifically, we first transfer a pre-trained model into a model for a general task by fine-tuning it with a large and high quality dataset. We then perform a second fine-tuning step to adapt the transferred model to the target domain. We demonstrate the benefits of our approach for answer
-
IEEE Signal Processing Letters2020We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech. We investigate using Gaussian Mixture Models (GMM) and Deep Neural Networks (DNN) to model the mapping between acoustic features of normal speech and those of whispered speech. We evaluate naturalness and speaker similarity
-
AAAI 20202020A considerable part of the success experienced by Voice-controlled virtual assistants (VVA) is due to the emotional and personalized experience they deliver, with humor being a key component in providing an engaging interaction. In this paper we describe methods used to improve the joke skill of a VVA through personalization. The first method, based on traditional NLP techniques, is robust and scalable.
-
AAAI 20202020Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as soft labels to optimize the student. However, when the teacher is considerably large, there is no guarantee that the internal knowledge of the teacher will be transferred into
-
AAAI 20202020Task-oriented dialog agents provide a natural language interface for users to complete their goal. Dialog State Tracking (DST), which is often a core component of these systems, tracks the system’s understanding of the user’s goal throughout the conversation. To enable accurate multi-domain DST, the model needs to encode dependencies between past utterances and slot semantics and understand the dialog context
Related content
-
August 27, 2018To handle more-natural spoken interactions, Alexa must track references through several rounds of conversation. If, for instance, a customer says, “How far is it to Redmond?” and after the answer follows up by saying, “Find good Indian restaurants there”, Alexa should be able to infer that “there” refers to Redmond.
-
August 24, 2018This year’s Interspeech — the largest conference in speech technology — will take place in Hyderabad, India, the first week of September. More than 40 Amazon researchers will be attending, including Björn Hoffmeister, the senior manager for machine learning in the Alexa Automatic Speech Recognition group. He took a few minutes to answer three questions about this year’s conference.
-
August 23, 2018Here’s a fairly common interaction with Alexa: “Alexa, set volume to five”; “Alexa, play music”. Even though the queries come in quick succession, the customer needs to repeat the wake word “Alexa”. To allow for more natural interactions, the device could immediately re-enter its listening state after the first query, without wake-word repetition; but that would require it to detect whether a follow-up speech input is indeed a query intended for the device (“device-directed”) or just background speech (“non-device-directed”).
-
August 19, 2018At the annual meeting of the North American chapter of the Association for Computational Linguistics in June, researchers at Amazon and the University of Sheffield released a new dataset that can be used to train machine-learning systems to determine the veracity of factual assertions online. The dataset is called FEVER, for fact extraction and verification.
-
August 18, 2018"Perfect hashing" is among the techniques that reduce the memory footprints of machine learning models by 94%.
-
August 08, 2018New machine-learned multilingual named-entity transliteration system.