-
ICASSP 20222022We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more “emotion aware”. We generate targets for the sentiment classification using text-to-sentiment model trained on publicly available data. Finally, we fine-tune the acoustic
-
ICASSP 20222022Accurate and timely recognition of the trigger keyword is vital for a good customer experience on smart devices. In the traditional keyword spotting task, there is typically a trade-off needed between accuracy and latency, where higher accuracy can be achieved by waiting for more context. In this paper, we propose a deep learning model that separates the keyword spotting task into three phases in order
-
ICASSP 20222022In this paper we integrate classic adaptive filtering algorithms with modern deep learning to propose a new approach called deep adaptive AEC. The main idea is to represent the linear adaptive algorithm as a differentiable layer within a deep neural network (DNN) framework. This enables the gradients to flow through the adaptive layer during back propagation and the inner layers of the DNN are trained to
-
ICASSP 20222022Non-parallel voice conversion (VC) is typically achieved using lossy representations of the source speech. However, ensuring only speaker identity information is dropped whilst all other information from the source speech is retained is a large challenge. This is particularly challenging in the scenario where at inference-time we have no knowledge of the text being read, i.e., text-free VC. To mitigate
-
ICASSP 20222022While end-to-end models have shown great success on the Automatic Speech Recognition task, performance degrades severely when target sentences are long-form. The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. For both methods, word error rate (WER) decreases monotonically when overlapping percentage decreases. Setting aside computational cost,
Related content
-
August 15, 2022Data augmentation and post-editing strategies lift Amazon’s submission above competitors.
-
August 02, 2022With an encoder-decoder architecture — rather than decoder only — the Alexa Teacher Model excels other large language models on few-shot tasks such as summarization and machine translation.
-
August 01, 2022McKeown awarded IEEE Innovation in Societal Infrastructure Award and named a member of the American Philosophical Society.
-
July 28, 2022Donato Crisostomi talks about how his mother helped spark a love of knowledge that led him to two science internships at Amazon.
-
July 22, 2022New EMNLP workshop will feature talks, papers, posters, and a competition built around the 50-plus-language, million-utterance MASSIVE dataset.
-
July 15, 2022New method optimizes the twin demands of retrieving relevant content and filtering out bad content.