Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Fast Intent Classification for Spoken-Language-Understanding Systems

Akshit Tyagi, Varun Sharma, Rahul Gupta, Lynn Samson, Nan Zhuang, Zihang Wang, William M. Campbell

ICASSP 2020

2020

Spoken Language Understanding (SLU) systems consist of several machine learning components operating together (e.g. intent classification, named entity recognition and resolution). Deep learning models have obtained state of the art results on several of these tasks, largely attributed to their better modeling capacity. However, an increase in modeling capacity comes with added costs of higher latency and

Conversational AI
Design Considerations for Hypothesis Rejection Modules in Spoken-Language-Understanding Systems

Aman Alok, Rahul Gupta, Shankar Ananthakrishnan

ICASSP 2020

2020

Spoken Language Understanding (SLU) systems typically consist of a set of machine learning models that operate in conjunction to produce an SLU hypothesis. The generated hypothesis is then sent to downstream components for further action. However, it is desirable to discard an incorrect hypothesis before sending it downstream. In this work, we present two designs for SLU hypothesis rejection modules: (i

Conversational AI
Raw Waveform-Based End-to-End Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources

Harshavardhan Sundar, Weiran Wang, Ming Sun, Chao Wang

ICASSP 2020

2020

In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. Previously reported deep-learning-based approaches work well in localizing a single source directly from multi-channel raw audio but are not easily extendable to localize multiple sources due to the well-known permutation

Related: Locating multiple sound sources from raw audio

Conversational AI
Few-Shot Acoustic Event Detection via Meta Learning

Bowen Shi, Ming Sun, Krishna C. Puvvada, Chieh-Chi Kao, Spyros Matsoukas, Chao Wang

ICASSP 2020

2020

We study few-shot acoustic event detection (AED) in this paper. Few-shot learning enables detection of new events with very limited labeled data. Compared to other research areas like computer vision, few-shot learning for audio recognition has been understudied. We formulate the few-shot AED problem and explore different ways of utilizing traditional supervised methods for this setting as well as a variety

Conversational AI
Unsupervised Pre-Training of Bidirectional Speech Encoders via Masked Reconstruction

Weiran Wang, Qingming Tang, Karen Livescu

ICASSP 2020

2020

We propose an approach for pre-training speech representations via a masked reconstruction loss. Our pre-trained encoder networks are bidirectional and can therefore be used directly in typical bidirectional speech recognition models. The pre-trained networks can then be fine-tuned on a smaller amount of supervised data for speech recognition. Experiments with this approach on the LibriSpeech and Wall Street

Conversational AI

_{Projection image adapted from Michael Horvath under the CC BY-SA 4.0 license}

New method for compressing neural networks better preserves accuracy

Anish Acharya, Rahul Goel

January 15, 2019

Neural networks have been responsible for most of the top-performing AI systems of the past decade, but they tend to be big, which means they tend to be slow. That’s a problem for systems like Alexa, which depend on neural networks to process spoken requests in real time.

Conversational AI
How Alexa may learn to retrieve stored "memories"

Rasool Fakoor

December 21, 2018

In May 2018, Amazon launched Alexa’s Remember This feature, which enables customers to store “memories” (“Alexa, remember that I took Ben’s watch to the repair store”) and recall them later by asking open-ended questions (“Alexa, where is Ben’s watch?”).

Search and information retrieval
How Alexa knows “peanut butter” is one shopping-list item, not two

Sanchit Agarwal

December 18, 2018

At a recent press event on Alexa's latest features, Alexa’s head scientist, Rohit Prasad, mentioned multistep requests in one shot, a capability that allows you to ask Alexa to do multiple things at once. For example, you might say, “Alexa, add bananas, peanut butter, and paper towels to my shopping list.” Alexa should intelligently figure out that “peanut butter” and “paper towels” name two items, not four, and that bananas are a separate item.

Conversational AI
With New Data Representation Scheme, Alexa Can Better Match Skills to Customer Requests

Young-Bum Kim

December 17, 2018

In recent years, data representation has emerged as an important research topic within machine learning.

Conversational AI
New Approach to Language Modeling Reduces Speech Recognition Errors by Up to 15%

Ankur Gandhe

December 13, 2018

Language models are a key component of automatic speech recognition systems, which convert speech into text. A language model captures the statistical likelihood of any particular string of words, so it can help decide between different interpretations of the same sequence of sounds.

Conversational AI
Distributed “Re-Ranker” ensures that Alexa improvements reach customers ASAP

Chengwei Su

December 11, 2018

Suppose that you say to Alexa, “Alexa, play Mary Poppins.” Alexa must decide whether you mean the book, the video, or the soundtrack. How should she do it?

Conversational AI

Conversational AI

Publications

Related content

Work with us