-
Interspeech 20202020Wake word (WW) spotting is challenging in far-field due to the complexities and variations in acoustic conditions and the environmental interference in signal transmission. A suite of carefully designed and optimized audio front-end (AFE) algorithms help mitigate these challenges and provide better quality audio signals to the downstream modules such as WW spotter. Since the WW model is trained with the
-
Interspeech 20202020Neural models have yielded state-of-the-art results in deciphering spoken language understanding (SLU) problems; however, these models require a significant amount of domain-specific labeled examples for training, which is prohibitively expensive. While pre-trained language models like BERT have been shown to capture a massive amount of knowledge by learning from unlabeled corpora and solve SLU using fewer
-
Interspeech 20202020Automatic dubbing aims at replacing all speech contained in a video with speech in a different language, so that the result sounds and looks as natural as the original. Hence, in addition to conveying the same content of an original utterance (which is the typical objective of speech translation), dubbed speech should ideally also match its duration, the lip movements and gestures in the video, timbre,
-
Interspeech 20202020Compression and quantization is important to neural networks in general and Automatic Speech Recognition (ASR) systems in particular, especially when they operate in real-time on resource-constrained devices. By using fewer number of bits for the model weights, the model size becomes much smaller while inference time is reduced significantly, with the cost of degraded performance. Such degradation can be
-
Interspeech 20202020Domain-agnostic Automatic Speech Recognition (ASR) systems suffer from the issue of mistranscribing domain-specific words, which leads to failures in downstream tasks. In this paper, we present a post-editing ASR error correction method using the Transformer model for entity mention correction and retrieval. Specifically, we propose a novel augmented variant of the Transformer model that encodes both the
Related content
-
June 03, 2021Rastrow discussed the continued challenges and expanded role of speech recognition, and some of the interesting research and themes that emerged from ICASSP 2021.
-
June 03, 2021Amazon Scholar Heng Ji says that deep learning could benefit from the addition of a little linguistic intuition.
-
June 02, 2021More-autonomous machine learning systems will make Alexa more self-aware, self-learning, and self-service.
-
June 01, 2021The event is over, but Amazon Science interviewed each of the six speakers within the Science of Machine Learning track. See what they had to say.
-
May 26, 2021Teams from three continents will compete to develop agents that assist customers in completing multi-step tasks.
-
May 19, 2021Calibrating noise addition to word density in the embedding space improves utility of privacy-protected text.