-
The Web Conference 2021 Workshop on Multilingual Search2021Query Language identification is an important part of a multilingual product search system. However, accurate language identification in product searches is difficult due to multiple reasons, including presence of noise in available datasets. In this work, we propose a learning framework that combines weak supervision with noisy label pruning. We use Convolutional Neural Networks (CNN) based models to carry
-
NAACL 20212021Frame-based state representation is widely used in modern task-oriented dialog systems to model user intentions and slot values. However, a fixed design of domain ontology makes it difficult to extend to new services and APIs. Recent work proposed to use natural language descriptions to define the domain ontology instead of tag names for each intent or slot, thus offering a dynamic set of schema. In this
-
ICASSP 20212021Automatic Speech Recognition (ASR) based on Recurrent Neural Network Transducers (RNN-T) is gaining interest in the speech community. We investigate data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words. For evaluation we use clean data, data with disfluencies and a separate dataset with speech affected by stuttering. We show
-
NAACL 20212021Exploiting label hierarchies has become a promising approach to tackling the zero-shot multi-label text classification (ZS-MTC) problem. Conventional methods aim to learn a matching model between text and labels, using a graph encoder to incorporate label hierarchies to obtain effective label representations (Rios and Kavuluru, 2018). More recently, pretrained models like BERT (Devlin et al., 2018) have
-
ICASSP 20212021While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style. In this work, we present a novel 3-step methodology to circumvent the costly operation of recording large amounts of target data in order to build expressive style voices with as little as 15 minutes of such recordings
Related content
-
July 14, 2022To become the interface for the Internet of things, conversational agents will need to learn on their own. Alexa has already started down that path.
-
July 13, 2022Four MIT professors are the recipients of the inaugural call for research projects.
-
July 13, 2022Allowing separate tasks to converge on their own schedules and using knowledge distillation to maintain performance improves accuracy.
-
July 11, 2022The SCOT science team used lessons from the past — and improved existing tools — to contend with “a peak that lasted two years”.
-
July 08, 2022Industry track chair and Amazon principal research scientist Rashmi Gangadharaiah on trends in industry papers and the challenges of building practical dialogue systems.
-
July 08, 2022New model sets new standard in accuracy while enabling 60-fold speedups.