Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Lightweight end-to-end text-to-speech synthesis for low resource on-device applications

Biel Tura Vecino, Adam Gabrys, Daniel Matwicki, Andrzej Pomirski, Tom Iddon, Marius Cotescu, Jaime Lorenzo Trueba

12th ISCA Speech Synthesis Workshop (SSW12)

2023

Recent works have shown that modelling raw waveform directly from text in an end-to-end (E2E) fashion produces more natural-sounding speech than traditional neural text-to-speech (TTS) systems based on a cascade or two-stage approach. However, current E2E state-of-the-art models are computationally complex and memory-consuming, making them unsuitable for real-time offline on-device applications in low-resource

Conversational AI
Controllable emphasis with zero data for text-to-speech

Arnaud Joly, Marco Nicolis, Ekaterina Petrova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova

12th ISCA Speech Synthesis Workshop (SSW12)

2023

We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques improving

Conversational AI
Text is all you need: Learning language representations for sequential recommendation

Jiahceng Li, Ming Wang, Jin Li, Jinmiao Fu, Xin Shen, Jingbo Shang, Julian McAuley

KDD 2023

2023

Sequential recommendation aims to model dynamic user behavior from historical interactions. Existing methods rely on either explicit item IDs or general textual features for sequence modeling to understand user preferences. While promising, these approaches still struggle to model cold-start items or transfer knowledge to new datasets. In this paper, we propose to model user preferences and item features

Conversational AI
PersonaPKT: Building personalized dialogue agents via parameter-efficient knowledge transfer

Xu Han, Bin Guo, Yoon Jung, Benjamin Yao, Yu Zhang, Derek Liu, Chenlei (Edward) Guo

ACL 2023 Workshop on SustaiNLP

2023

Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency. However, such descriptions may not always be available or may pose privacy concerns. To tackle this bottleneck, we introduce PersonaPKT, a lightweight transfer learning approach that can build persona-consistent dialogue models without explicit

Conversational AI
Reducing cohort bias in natural language understanding systems with targeted self-training scheme

Thu Le, Gabriela Cortes Hernandez, Bei Chen, Melanie Bradford

ACL 2023

2023

Bias in machine learning models can be an issue when the models are trained on particular types of data that do not generalize well, causing under performance in certain groups of users. In this work, we focus on reducing the bias related to new customers in a digital voice assistant system. It is observed that natural language understanding models often have lower performance when dealing with requests

Conversational AI

Amazon Unveils Novel Alexa Dialog Modeling for Natural, Cross-Skill Conversations

Alexa Science Team

June 05, 2019

Today, customer exchanges with Alexa are generally either one-shot requests, like “Alexa, what’s the weather?”, or interactions that require multiple requests to complete more complex tasks.

Conversational AI
Using adversarial training to recognize speakers’ emotions

Viktor Rozgic

May 21, 2019

A person’s tone of voice can tell you a lot about how they’re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic.

Conversational AI
Should Alexa read “2/3” as “two-thirds” or “February Third”?: The science of text normalization

Ming Sun

May 16, 2019

Text normalization is an important process in conversational AI. If an Alexa customer says, “book me a table at 5:00 p.m.”, the automatic speech recognizer will transcribe the time as “five p m”. Before a skill can handle this request, “five p m” will need to be converted to “5:00PM”. Once Alexa has processed the request, it needs to synthesize the response — say, “Is 6:30 p.m. okay?” Here, 6:30PM will be converted to “six thirty p m” for the text-to-speech synthesizer. We call the process of converting “5:00PM” to “five p m” text normalization and its counterpart — converting “five p m” to “5:00PM” — inverse text normalization.

Conversational AI
Training a Machine Learning Model in English Improves Its Performance in Japanese

Judith Gaspers

May 13, 2019

Recently, we published a paper showing that training a neural network to do language processing in English, then retraining it in German, drastically reduces the amount of German-language training data required to achieve a given level of performance.

Conversational AI
How we add new skills to Alexa’s name-free skill selector

Young-Bum Kim

May 03, 2019

Using cosine similarity rather than dot product to compare vectors helps prevent "catastrophic forgetting".

Conversational AI
“Alexa, Turn Down the Lights and Play Music”: The Science of Handling Compound Requests

Rahul Goel

May 02, 2019

Traditionally, Alexa has interpreted customer requests according to their intents and slots. If you say, “Alexa, play ‘What’s Going On?’ by Marvin Gaye,” the intent should be PlayMusic, and “‘What’s Going On?’” and “Marvin Gaye” should fill the slots SongName and ArtistName.

Conversational AI

Conversational AI

Publications

Related content

Work with us