-
Interspeech 20172017A challenge for speech recognition for voice-controlled household devices, like the Amazon Echo or Google Home, is robustness against interfering background speech. Formulated as a far-field speech recognition problem, another person or media device in proximity can produce background speech that can interfere with the device-directed speech. We expand on our previous work on device-directed speech detection
-
NeurIPS 20172017Conversational agents are exploding in popularity. However, much work remains in the area of non goal-oriented conversations, despite significant growth in research interest over recent years. To advance the state of the art in conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar university competition where sixteen selected university teams built conversational agents to deliver the
-
NeurIPS 20172017Modern virtual personal assistants provide a convenient interface for completing daily tasks via voice commands. An important consideration for these assistants is the ability to recover from automatic speech recognition (ASR) and natural language understanding (NLU) errors. In this paper, we focus on learning robust dialog policies to recover from these errors. To this end, we develop a user simulator
-
NeurIPS 20172017This paper presents the design of the machine learning architecture that underlies the Alexa Skills Kit (ASK) a large scale Spoken Language Understanding (SLU) Software Development Kit (SDK) that enables developers to extend the capabilities of Amazon’s virtual assistant, Alexa. At Amazon, the infrastructure powers over 25,000 skills deployed through the ASK, as well as AWS’s Amazon Lex SLU Service. The
-
Interspeech 20172017In this paper we investigate a time delay neural network (TDNN) for a keyword spotting task that requires low CPU, memory and latency. The TDNN is trained with transfer learning and multi-task learning. Temporal subsampling enabled by the time delay architecture reduces computational complexity. We propose to apply singular value decomposition (SVD) to further reduce TDNN complexity. This allows us to first
Related content
-
January 30, 2020Improvements come from new transfer learning method, new publicly released data set.
-
January 23, 2020New "Mad Libs" technique for replacing words in individual sentences is grounded in metric differential privacy.
-
January 21, 2020Self-learning system uses customers’ rephrased requests as implicit error signals.
-
January 16, 2020According to listener tests, whispers produced by a new machine learning model sound as natural as vocoded human whispers.
-
December 11, 2019Related data selection techniques yield benefits for both speech recognition and natural-language understanding.
-
November 06, 2019Today is the fifth anniversary of the launch of the Amazon Echo, so in a talk I gave yesterday at the Web Summit in Lisbon, I looked at how far Alexa has come and where we’re heading next.