-
ICASSP 20192019This paper presents our work of training acoustic event detection (AED) models using unlabeled dataset. Recent acoustic event detectors are based on large-scale neural networks, which are typically trained with huge amounts of labeled data. Labels for acoustic events are expensive to obtain, and relevant acoustic event audios can be limited, especially for rare events. In this paper we leverage an Internet-scale
-
ICASSP 20192019This is a report of our lessons learned building acoustic models from 1 Million hours of unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ student/teacher training on unlabeled data, helping scale out target generation in comparison to confidence model based methods, which require a decoder and a confidence model. To optimize storage and to parallelize target generation, we
-
ICASSP 20192019Conventional far-field automatic speech recognition (ASR) systems typically employ microphone array techniques for speech enhancement in order to improve robustness against noise or reverberation. However, such speech enhancement techniques do not always yield ASR accuracy improvement because the optimization criterion for speech enhancement is not directly relevant to the ASR objective. In this work, we
-
ICASSP 20192019The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between
-
NAACL 2019 Workshop on NeuralGen2019Semi-supervised learning is an efficient way to improve performance for natural language processing systems. In this work, we propose Para-SSL, a scheme to generate candidate utterances using paraphrasing and methods from semi-supervised learning. In order to perform paraphrase generation in the context of a dialog system, we automatically extract paraphrase pairs to create a paraphrase corpus. Using this
Related content
-
October 28, 2020Watch as four Amazon Alexa scientists talk about current state, new developments, and recent announcements surrounding advancements in Alexa speech technologies.
-
October 28, 2020Knowledge distillation technique for shrinking neural networks yields relative performance increases of up to 122%.
-
October 22, 2020Director of speech recognition Shehzad Mevawalla highlights recent advances in on-device processing, speaker ID, and semi-supervised learning.
-
October 21, 2020Applications in product recommendation and natural-language processing demonstrate the approach’s flexibility and ease of use.
-
October 16, 2020New system is the first to use an attention-based sequence-to-sequence model, dispensing with separate models for features such as vibrato and phoneme durations.
-
October 15, 2020Hear Breen discuss his work leading research teams in speech synthesis and text-to-speech technologies, the science behind Alexa’s enhanced voice styles, and more.