-
ICASSP 2019, EMNLP 20192019Typically, spoken language understanding (SLU) models are trained on annotated data which are costly to gather. Aiming to reduce data needs for bootstrapping a SLU system for a new language, we present a simple but effective weight transfer approach using data from another language. The approach is evaluated with our promising multi-task SLU framework developed towards different languages. We evaluate our
-
ICASSP 20192019The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach to sequence transduction, either by itself or in various multitask and decoding frameworks. We propose SAN-CTC, a deep, fully self-attentional network for
-
ICASSP 20192019Automatic speech recognition (ASR), audio quality, and loudness are key performance indicators (KPIs) in smart speakers. To improve all these KPIs, audio dynamics processing is a crucial component in related systems. Unfortunately, single-band and existing multiband dynamics processing (MBDP) schemes fail to maximize bass and loudness but even produce unwanted peaks, distortions, and nonlinear echo so that
-
ICASSP 20192019In this work we focus on confidence modeling for neural network based text classification and sequence to sequence models in the context of Natural Language Understanding (NLU) tasks. For most applications, the confidence of a neural network model in it’s output is computed as a function of the posterior probability, determined via a softmax layer. In this work, we show that such scores can be poorly calibrated
-
ICASSP 20192019We study media presence detection, that is, learning to recognize if a sound segment (typically lasting for a few seconds) of a long recorded stream contains media (TV) sound. This problem is difficult because non-media sound sources can be quite diverse (e.g. human voicing, non-vocal sounds and non-human sounds), and the recorded sound can be a mixture of media and non-media sound. Different from speech
Related content
-
January 23, 2020New "Mad Libs" technique for replacing words in individual sentences is grounded in metric differential privacy.
-
January 21, 2020Self-learning system uses customers’ rephrased requests as implicit error signals.
-
January 16, 2020According to listener tests, whispers produced by a new machine learning model sound as natural as vocoded human whispers.
-
December 11, 2019Related data selection techniques yield benefits for both speech recognition and natural-language understanding.
-
November 06, 2019Today is the fifth anniversary of the launch of the Amazon Echo, so in a talk I gave yesterday at the Web Summit in Lisbon, I looked at how far Alexa has come and where we’re heading next.
-
October 28, 2019In a paper we’re presenting at this year’s Conference on Empirical Methods in Natural Language Processing, we describe experiments with a new data selection technique.