Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

A Comparison of Pooling Methods on LSTM Models for Rare Acoustic Event Classification

Chieh-Chi Kao, Ming Sun, Weiran Wang, Chao Wang

ICASSP 2020

2020

Acoustic event classification (AEC) and acoustic event detection (AED) refer to the task of detecting whether specific target events occur in audios. As long short-term memory (LSTM) leads to state-of- the-art results in various speech related tasks, it is employed as a popular solution for AEC as well. This paper focuses on investigating the dynamics of LSTM model on AEC tasks. It includes a detailed analysis

Conversational AI
Fast Intent Classification for Spoken-Language-Understanding Systems

Akshit Tyagi, Varun Sharma, Rahul Gupta, Lynn Samson, Nan Zhuang, Zihang Wang, William M. Campbell

ICASSP 2020

2020

Spoken Language Understanding (SLU) systems consist of several machine learning components operating together (e.g. intent classification, named entity recognition and resolution). Deep learning models have obtained state of the art results on several of these tasks, largely attributed to their better modeling capacity. However, an increase in modeling capacity comes with added costs of higher latency and

Conversational AI
Design Considerations for Hypothesis Rejection Modules in Spoken-Language-Understanding Systems

Aman Alok, Rahul Gupta, Shankar Ananthakrishnan

ICASSP 2020

2020

Spoken Language Understanding (SLU) systems typically consist of a set of machine learning models that operate in conjunction to produce an SLU hypothesis. The generated hypothesis is then sent to downstream components for further action. However, it is desirable to discard an incorrect hypothesis before sending it downstream. In this work, we present two designs for SLU hypothesis rejection modules: (i

Conversational AI
Raw Waveform-Based End-to-End Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources

Harshavardhan Sundar, Weiran Wang, Ming Sun, Chao Wang

ICASSP 2020

2020

In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. Previously reported deep-learning-based approaches work well in localizing a single source directly from multi-channel raw audio but are not easily extendable to localize multiple sources due to the well-known permutation

Related: Locating multiple sound sources from raw audio

Conversational AI
Few-Shot Acoustic Event Detection via Meta Learning

Bowen Shi, Ming Sun, Krishna C. Puvvada, Chieh-Chi Kao, Spyros Matsoukas, Chao Wang

ICASSP 2020

2020

We study few-shot acoustic event detection (AED) in this paper. Few-shot learning enables detection of new events with very limited labeled data. Compared to other research areas like computer vision, few-shot learning for audio recognition has been understudied. We formulate the few-shot AED problem and explore different ways of utilizing traditional supervised methods for this setting as well as a variety

Conversational AI

Alexa & Friends features Nikko Ström, Alexa AI vice president and distinguished scientist

Staff writer

May 12, 2021

Ström discusses his career journey in conversational AI, his published research, and where he sees the field of conversational AI headed next

Conversational AI
3 questions with Philip Resnik: Analyzing social media to understand the risks of suicide

Staff writer

May 11, 2021

Resnik is a featured speaker at the first virtual Amazon Web Services Machine Learning Summit on June 2.

Machine learning
Credit: Arctop

How Endel’s AI-powered Focus soundscapes earned the backing of neuroscience

Dan Cole

May 4, 2021

A new study has found that when compared to curated playlists and silence, personalized AI soundscapes generated by Alexa Fund company Endel are more effective in helping people focus.

Conversational AI
Credit: Glynis Condon

Improving the accuracy of privacy-preserving neural networks

Satyapriya Krishna

April 14, 2021

ADePT model transforms the texts used to train natural-language-understanding models while preserving semantic coherence.

Conversational AI
Alexa & Friends features Spyros Matsoukas, senior principal applied scientist, Alexa AI

Staff writer

April 9, 2021

Matsoukas discusses his focus on automatic speech recognition, natural understanding, and dialogue management, as well as how those research domains are making Alexa more intelligent and useful.

Conversational AI
Credit: Glynis Condon

Automatically generating text from structured data

Isabel Groves

April 7, 2021

Technique that lets devices convey information in natural language improves on state of the art.

Conversational AI

Conversational AI

Publications

Related content

Work with us