Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Improving Emotion Classification through Variational Inference of Latent Variables

Srinivas Parthasarathy, Viktor Rozgic, Ming Sun, Chao Wang

ICASSP 2019

2019

Conventional models for emotion recognition from speech signal are trained in supervised fashion using speech utterances with emotion labels. In this study we hypothesize that speech signal depends on multiple latent variables including the emotional state, age, gender, and speech content. We propose an Adversarial Autoencoder (AAE) to perform variational inference over the latent variables and reconstruct

Related: Using adversarial training to recognize speakers’ emotions

Conversational AI
Fine-grained robust prosody transfer for single-speaker neural text-to-speech

Viacheslav Klimkov, Srikanth Ronanki, Jonas Rohnke, Thomas Drugman

Interspeech 2019

2019

We present a neural text-to-speech system for fine-grained prosody transfer from one speaker to another. Conventional approaches for end-to-end prosody transfer typically use either fixed-dimensional or variable-length prosody embedding via a secondary attention to encode the reference signal. How-ever, when trained on a single-speaker dataset, the conventional prosody transfer systems are not robust enough

Related: Neural TTS Makes Speech Synthesizers More Versatile

Conversational AI
Jointly embedding semi-structured OpenIE mentions and knowledge-bases for relation extraction with entity-free parameters

Xin Luna Dong, Andrew Hamel

NAACL 2019

2019

In this paper, we consider advancing webscale knowledge extraction and alignment by integrating OpenIE extractions in the form of (subject, predicate, object) triples with Knowledge Bases (KB). Traditional techniques from universal schema and from schema mapping fall in two extremes: either they perform instance-level inference relying on embedding for (subject, object) pairs, thus cannot handle pairs absent

Machine learning
Cross-lingual Transfer Learning for Japanese Named Entity Recognition

Andrew Johnson, Penny Karanasou, Judith Gaspers

NAACL 2019

2019

This work explores cross-lingual transfer learning (TL) for named entity recognition, focusing on bootstrapping Japanese from English. A deep neural network model is adopted and the best combination of weights to transfer is extensively investigated. Moreover, a novel approach is presented that overcomes linguistic differences between this language pair by romanizing a portion of the Japanese input. Experiments

Related: Training a Machine Learning Model in English Improves Its Performance in Japanese

Conversational AI
Towards achieving robust universal neural vocoding

Jaime Lorenzo Trueba, Thomas Drugman, Javier Latorre, Tom Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal

Interspeech 2019

2019

This paper explores the potential universality of neural vocoders. We train a WaveRNN-based vocoder on 74 speakers coming from 17 languages. This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain

Related: Neural TTS Makes Speech Synthesizers More Versatile

Conversational AI

How we add new skills to Alexa’s name-free skill selector

Young-Bum Kim

May 03, 2019

Using cosine similarity rather than dot product to compare vectors helps prevent "catastrophic forgetting".

Conversational AI
“Alexa, Turn Down the Lights and Play Music”: The Science of Handling Compound Requests

Rahul Goel

May 02, 2019

Traditionally, Alexa has interpreted customer requests according to their intents and slots. If you say, “Alexa, play ‘What’s Going On?’ by Marvin Gaye,” the intent should be PlayMusic, and “‘What’s Going On?’” and “Marvin Gaye” should fill the slots SongName and ArtistName.

Conversational AI
Training Speech Synthesizers on Data from Multiple Speakers

Jakub Lachowicz

April 25, 2019

When a customer asks Alexa to play “Hey Jude”, and Alexa responds, “Playing 'Hey Jude' by the Beatles,” that response is generated by a text-to-speech (TTS) system, which converts textual inputs into synthetic-speech outputs...

Conversational AI
Using wake word acoustics to filter out background speech improves speech recognition by 15%

Xing Fan

April 22, 2019

One of the ways that we’re always trying to improve Alexa’s performance is by teaching her to ignore speech that isn’t intended for her. At this year’s International Conference on Acoustics, Speech, and Signal Processing, my colleagues and I will present a new technique for doing this, which could complement the techniques that Alexa already uses.

Conversational AI
Two new papers discuss how Alexa recognizes sounds

Ming Sun

April 18, 2019

Last year, Amazon announced the beta release of Alexa Guard, a new service that lets customers who are leaving the house instruct their Echo devices to listen for glass breaking or smoke and carbon dioxide alarms going off. At this year’s International Conference on Acoustics, Speech, and Signal Processing, our team is presenting several papers on sound detection. I wrote about one of them a few weeks ago, a new method for doing machine learning with unbalanced data sets.

Conversational AI
Signal processor improves Echo’s bass response, loudness, and speech recognition accuracy

Jun Yang

April 11, 2019

Multiband dynamics processing, which separately modifies volume in different frequency bands of an audio signal, is known to improve listeners’ audio experiences. But in the context of voice-controlled systems like the Amazon Echo family of products, it can also improve automatic speech recognition by making echo cancellation easier.

Conversational AI

Conversational AI

Publications

Related content

Work with us