Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

The Sockeye 2 neural machine translation toolkit at AMTA 2020

Tobias Domhan, Michael Denkowski, David Vilar, Xing Niu, Felix Hieber, Kenneth Heafield

AMTA 2020

2020

We present Sockeye 2, a modernized and streamlined version of the Sockeye neural machine translation (NMT) toolkit. New features include a simplified code base through the use of MXNet’s Gluon API, a focus on state of the art model architectures, distributed mixed precision training, and efficient CPU decoding with 8-bit quantization. These improvements result in faster training and inference, higher automatic

Conversational AI
Dynamic prosody generation for speech synthesis using linguistics-driven acoustic embedding selection

Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo Trueba

Interspeech 2020

2020

Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-human capabilities. But something which is still lacking in order to achieve human-like communication is the dynamic variations and adaptability of human speech in more complex scenarios. This work attempts to solve the problem of achieving a more dynamic and natural intonation in TTS systems, particularly for stylistic

Related: More-natural prosody for synthesized speech

Conversational AI
Towards an ASR error robust spoken language understanding system

Weitong Ruan, Yaroslav Nechaev, Luoxin Chen, Chengwei Su, Imre Kiss

Interspeech 2020

2020

A modern Spoken Language Understanding (SLU) system usually contains two sub-systems, Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU),where ASR transforms voice signal to text form and NLU provides intent classiﬁcation and slot ﬁlling from the text. In practice,such decoupled ASR/NLU design facilitates fast model iteration for both components. However, this makes downstream NLU

Conversational AI
CopyCat: Many-to-many fine-grained prosody transfer for neural text-to-speech

Sri Karlapati, Alexis Moinet, Arnaud Joly, Viacheslav Klimkov, Daniel Sáez-Trigueros, Thomas Drugman

Interspeech 2020

2020

Prosody Transfer (PT) is a technique that aims to use the prosody from a source audio as a reference while synthesizing speech. Fine-grained PT aims at capturing prosodic aspects like rhythm, emphasis, melody, duration, and loudness, from a source audio at a very granular level and transferring them when synthesizing speech in a different target speaker’s voice. Cur-rent approaches for fine-grained PT suffer

Related: More-natural prosody for synthesized speech

Conversational AI
Improved training strategies for end-to-end speech recognition in digital voice assistants

Hitesh Tulsiani, Ashtosh Sapru, Harish Arsikere, Surabhi Punjabi, Sri Garimella

Interspeech 2020

2020

The speech recognition training data corresponding to digital voice assistants is dominated by wake-words. Training endto-end (E2E) speech recognition models without careful attention to such data results in sub-optimal performance as models prioritize learning wake-words. To address this problem, we propose a novel discriminative initialization strategy by introducing a regularization term to penalize

Conversational AI

Amazon scientists use transfer learning to accelerate development of new Alexa capabilities

Angeliki Metallinou

May 24, 2018

Amazon scientists are continuously expanding Alexa’s natural-language-understanding (NLU) capabilities to make Alexa smarter, more useful, and more engaging.

Conversational AI
Yang, Jun

Amazon Scientist Outlines Multilayer System For Smart Speaker Echo Cancellation And Voice Enhancement

Jun Yang

May 11, 2018

Smart speakers, such as the Amazon Echo family of products, are growing in popularity among consumer and business audiences. In order to improve the automatic speech recognition (ASR) and full-duplex voice communication (FDVC) performance of these smart speakers, acoustical echo cancellation (AEC) and noise reduction systems are required. These systems reduce the noises and echoes that can impact operation, such as an Echo device accurately hearing the wake word “Alexa.”

Conversational AI
Amazon and University of Sheffield researchers make large-scale fact extraction and verification dataset publicly available

Arpit Mittal

May 04, 2018

In recent years, the amount of textual information produced daily has increased exponentially. This information explosion has been accelerated by the ease with which data can be shared across the web. Most of the textual information is generated as free-form text, and only a small fraction is available in structured format (Wikidata, Freebase etc.) that can be processed and analyzed directly by machines.

Search and information retrieval
Making Alexa more friction-free

Ruhi Sarikaya

April 25, 2018

This morning, I am delivering a keynote talk at the World Wide Web Conference in Lyon, France, with the title, Conversational AI for Interacting with the Digital and Physical World.

Conversational AI
Alexa scientists present two new techniques that improve wake word performance

Minhua Wu

April 12, 2018

The Amazon Echo is a hands-free smart home speaker you control with your voice. The first important step in enabling a delightful customer experience with an Echo or other Alexa-enabled device is wake word detection, so accurate detection of “Alexa” or substitute wake words is critical. It is challenging to build a wake word system with low error rates when there are limited computation resources on the device and it's in the presence of background noise such as speech or music.

Conversational AI
Alexa scientists address challenges of end-pointing

Roland Maas

April 10, 2018

Just as Alexa can wake up without the need to press a button, she also automatically detects when a user finishes her query and expects a response. This task is often called “end-of-utterance detection,” “end-of-query detection,” “end-of-turn detection,” or simply “end-pointing.”

Conversational AI

Conversational AI

Publications

Related content

Work with us