-
The Web Conference 2021 Workshop on Multilingual Search2021Query Language identification is an important part of a multilingual product search system. However, accurate language identification in product searches is difficult due to multiple reasons, including presence of noise in available datasets. In this work, we propose a learning framework that combines weak supervision with noisy label pruning. We use Convolutional Neural Networks (CNN) based models to carry
-
NAACL 20212021Frame-based state representation is widely used in modern task-oriented dialog systems to model user intentions and slot values. However, a fixed design of domain ontology makes it difficult to extend to new services and APIs. Recent work proposed to use natural language descriptions to define the domain ontology instead of tag names for each intent or slot, thus offering a dynamic set of schema. In this
-
ICASSP 20212021Automatic Speech Recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluencies and a separate dataset with speech affected by stuttering. We show
-
NAACL 20212021Exploiting label hierarchies has become a promising approach to tackling the zero-shot multi-label text classification (ZS-MTC) problem. Conventional methods aim to learn a matching model between text and labels, using a graph encoder to incorporate label hierarchies to obtain effective label representations (Rios and Kavuluru, 2018). More recently, pretrained models like BERT (Devlin et al., 2018) have
-
ICASSP 20212021While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style. In this work, we present a novel 3-step methodology to circumvent the costly operation of recording large amounts of target data in order to build expressive style voices with as little as 15 minutes of such recordings
Related content
-
June 08, 2022New method would enable BERT-based natural-language-processing models to handle longer text strings, run in resource-constrained settings — or sometimes both.
-
Based on a figure from "TernaryBERT: Distillation-aware ultra-low bit BERT"June 06, 2022Combination of distillation and distillation-aware quantization compresses BART model to 1/16th its size.
-
June 01, 2022Knowledge distillation and discriminative training enable efficient use of a BERT-based model to rescore automatic-speech-recognition hypotheses.
-
May 27, 2022Amazon Scholar and Columbia professor Kathleen McKeown on model compression, data distribution shifts, language revitalization, and more.
-
May 17, 2022Papers focus on speech conversion and data augmentation — and sometimes both at once.
-
May 12, 2022Multimodal training, signal-to-interpretation, and BERT rescoring are just a few topics covered by Amazon’s 21 speech-related papers.