-
Interspeech 20212021The success of modern deep learning systems is built on two cornerstones, massive amount of annotated training data and advanced computational infrastructure to support large-scale computation. In recent years, the model size of state-of-the-art deep learning systems has rapidly increased and sometimes reached to billions of parameters. Herein we take a close look into this phenomenon and present an empirical
-
Interspeech 20212021End-to-end automatic speech recognition systems map a sequence of acoustic features to text. In modern systems, text is encoded to grapheme subwords which are generated by methods designed for text processing tasks and therefore don’t model or take advantage of the statistics of the acoustic features. Here, we present a novel method for generating grapheme subwords that are derived from phoneme sequences
-
Interspeech 20212021Fine-tuning transformer-based models have shown to outperform other methods for many Natural Language Understanding (NLU) tasks. Recent studies to reduce the size of transformer models have achieved reductions of > 80%, making on-device inference on powerful devices possible. However, other resource-constrained devices, like those enabling voice assistants (VAs), require much further reductions. In this
-
Interspeech 20212021This paper proposes a general enhancement to the Normalizing Flows (NF) used in neural vocoding. As a case study, we improve expressive speech vocoding with a revamped Parallel Wavenet (PW). Specifically, we propose to extend the affine transformation of PW to the more expressive invertible nonaffine function. The greater expressiveness of the improved PW leads to better-perceived signal quality and naturalness
-
Interspeech 20212021Multi-channel inputs offer several advantages over singlechannel, to improve the robustness of on-device speech recognition systems. Recent work on multi-channel transformer, has proposed a way to incorporate such inputs into end-to-end ASR for improved accuracy. However, this approach is characterized by a high computational complexity, which prevents it from being deployed in on-device systems. In this
Related content
-
February 03, 2021Neural text-to-speech enables new multilingual model to use the same voice for Spanish and English responses.
-
January 26, 2021Sneha Rajana is an applied scientist at Amazon today, but she didn't start out that way. Learn how she made the switch, and the advice she has for others considering a similar change.
-
January 25, 2021New approach to few-shot learning improves on state of the art by combining prototypical networks with data augmentation.
-
January 21, 2021Amazon principal applied scientist Yang Liu on the frontiers of speech and dialogue.
-
January 13, 2021In experiments, multilingual models outperform monolingual models.
-
December 18, 2020Researchers propose a method to automatically generate training data for Alexa by identifying cases in which customers rephrase unsuccessful requests.