Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Sentiment-aware automatic speech recognition pre-training for enhanced speech emotion recognition

Ayoub Ghriss, Bo Yang, Viktor Rozgic, Elizabeth Shriberg, Chao Wang

ICASSP 2022

2022

We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more “emotion aware”. We generate targets for the sentiment classification using text-to-sentiment model trained on publicly available data. Finally, we fine-tune the acoustic

Conversational AI
Unified speculation, detection, and verification keyword spotting

Geng-shen Fu, Thibaud Sénéchal, Aaron Challenner, Tao Zhang

ICASSP 2022

2022

Accurate and timely recognition of the trigger keyword is vital for a good customer experience on smart devices. In the traditional keyword spotting task, there is typically a trade-off needed between accuracy and latency, where higher accuracy can be achieved by waiting for more context. In this paper, we propose a deep learning model that separates the keyword spotting task into three phases in order

Conversational AI
Deep adaptive AEC: Hybrid of deep learning and adaptive acoustic echo cancellation

Hao Zhang, Srivatsan Kandadai, Harsha Rao, Minje Kim, Tarun Pruthi, Trausti Kristjansson

ICASSP 2022

2022

In this paper we integrate classic adaptive filtering algorithms with modern deep learning to propose a new approach called deep adaptive AEC. The main idea is to represent the linear adaptive algorithm as a differentiable layer within a deep neural network (DNN) framework. This enables the gradients to flow through the adaptive layer during back propagation and the inner layers of the DNN are trained to

Conversational AI
Text-free non-parallel many-to-many voice conversion using normalising flows

Tom Merritt, Abdelhamid Ezzerg, Piotr Bilinski, Magdalena Proszewska, Kamil Pokora, Roberto Barra-Chicote, Daniel Korzekwa

ICASSP 2022

2022

Non-parallel voice conversion (VC) is typically achieved using lossy representations of the source speech. However, ensuring only speaker identity information is dropped whilst all other information from the source speech is retained is a large challenge. This is particularly challenging in the scenario where at inference-time we have no knowledge of the text being read, i.e., text-free VC. To mitigate

Related: Amazon Text-to-Speech group's research at ICASSP 2022

Conversational AI
VADOI: Voice-activity-detection overlapping inference for end-to-end long-form speech recognition

Jinhan Wang, Xiaosu Tong, Jinxi Guo, Di He, Roland Maas

ICASSP 2022

2022

While end-to-end models have shown great success on the Automatic Speech Recognition task, performance degrades severely when target sentences are long-form. The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. For both methods, word error rate (WER) decreases monotonically when overlapping percentage decreases. Setting aside computational cost,

Conversational AI

Amazon wins contest to control "formality" in machine translation

Daniel Zhang

August 15, 2022

Data augmentation and post-editing strategies lift Amazon’s submission above competitors.

Conversational AI
20B-parameter Alexa model sets new marks in few-shot learning

Saleh Soltan

August 02, 2022

With an encoder-decoder architecture — rather than decoder only — the Alexa Teacher Model excels other large language models on few-shot tasks such as summarization and machine translation.

Conversational AI
Columbia University

Amazon Scholar Kathleen McKeown receives dual honors

Staff writer

August 01, 2022

McKeown awarded IEEE Innovation in Societal Infrastructure Award and named a member of the American Philosophical Society.

Conversational AI
“I didn’t imagine I could grow and learn so much”

Ayeshah Émon

July 28, 2022

Donato Crisostomi talks about how his mother helped spark a love of knowledge that led him to two science internships at Amazon.

Conversational AI
Massively Multilingual NLU 2022: Call for papers and shared-task entries

Jack G. M. FitzGerald, Kay Rottmann

July 22, 2022

New EMNLP workshop will feature talks, papers, posters, and a competition built around the 50-plus-language, million-utterance MASSIVE dataset.

Conversational AI
Filtering out "forbidden" documents during information retrieval

David Carmel

July 15, 2022

New method optimizes the twin demands of retrieving relevant content and filtering out bad content.

Search and information retrieval

Conversational AI

Publications

Related content

Work with us