-
ICASSP 20202020We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approach for few-shot speaker adaptation. Here, the task is to fine-tune a pre-trained TTS model to mimic a new speaker using a small corpus of target utterances. We demonstrate that there does not exist a one-size-fits-all adaptation strategy, with convincing synthesis requiring a corpus-specific configuration
-
The Web Conference 20202020Community Question Answering (CQA) websites, such as Stack Exchange or Quora, allow users to freely ask questions and obtain answers from other users, i.e., the community. Personal assistants, such as Amazon Alexa or Google Home, can also exploit CQA data to answer a broader range of questions and increase customers’ engagement. However, the voice-based interaction poses new challenges to the Question Answering
-
ICASSP 20202020Supervised deep learning has gained significant attention fo rspeech enhancement recently. The state-of-the-art deep-learning methods perform the task by learning a ratio/binary mask that is applied to the mixture in the time-frequency domain to produce the clean speech. Despite the great performance in the single-channel setting, these frameworks lag in performance in the multichannel setting, as the majority
-
The Web Conference 20202020Virtual assistants such as Amazon Alexa, Apple Siri, and Google Assistant often rely on a semantic parsing component to understand which action(s) to execute for an utterance spoken by its users. Traditionally, rule-based or statistical slot-filling systems have been used to parse “simple” queries — that is, queries that contain a single action and can be decomposed into a set of non-overlapping entities
-
ICASSP 20202020In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers and the
Related content
-
January 21, 2020Self-learning system uses customers’ rephrased requests as implicit error signals.
-
January 16, 2020According to listener tests, whispers produced by a new machine learning model sound as natural as vocoded human whispers.
-
December 11, 2019Related data selection techniques yield benefits for both speech recognition and natural-language understanding.
-
November 06, 2019Today is the fifth anniversary of the launch of the Amazon Echo, so in a talk I gave yesterday at the Web Summit in Lisbon, I looked at how far Alexa has come and where we’re heading next.
-
October 28, 2019In a paper we’re presenting at this year’s Conference on Empirical Methods in Natural Language Processing, we describe experiments with a new data selection technique.
-
October 17, 2019This year at EMNLP, we will cohost the Second Workshop on Fact Extraction and Verification — or FEVER — which will explore techniques for automatically assessing the veracity of factual assertions online.