-
ICASSP 20192019The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between
-
DSAA 2019, IEEE DSAA 20192019Spoken language can include sensitive topics including profanity, insults, political and offensive speech. In order to engage in contextually appropriate conversations, it is essential for voice services such as Alexa, Google Assistant, Siri, etc. to detect sensitive topics in the conversations and react appropriately. A simple approach to detect sensitive topics is to use regular expression or keyword
-
Interspeech 20192019Named entity recognition (NER) is a vital task in spoken language understanding, which aims to identify mentions of named entities in text e.g., from transcribed speech. Existing neural models for NER rely mostly on dedicated word-level representations, which suffer from two main shortcomings. First, the vocabulary size is large, yielding large memory requirements and training time. Second, these models
-
International Journal of Semantic Computing2019We demonstrate the potential for using aligned bilingual word embeddings in developing an unsupervised method to evaluate machine translations without a need for parallel translation corpus or reference corpus. We explain different aspects of digital entertainment content subtitles. We share our experimental results for four languages pairs — English to French, German, Portuguese, Spanish — and present
-
Interspeech 20192019In this paper, we extend our previous work on device-directed utterance detection, which aims to distinguish voice queries in-tended for a smart-home device from background speech. The task can be phrased as a binary utterance-level classification problem that we approach with a DNN-LSTM model using acoustic features and features from the automatic speech recognition (ASR) decoder as input. In this work
Related content
-
March 11, 2021Watch a recording of the presentation and Q&A roundtable featuring Amazon scientists and scholars.
-
March 11, 2021University teams will compete in building agents that can help customers complete complex tasks, like cooking and home improvement. Deadline for university team applications is April 16.
-
March 02, 2021The newest chapter addresses a problem that often bedevils nonparametric machine learning models.
-
March 01, 2021The Art Museum skill uses Alexa Conversations, an AI-driven dialogue management tool.
-
February 08, 2021Technique that relies on inverse reinforcement learning, or learning by example, improves task completion rate by 14% to 17% in simulations.
-
February 08, 2021Yanagisawa discusses the science behind Alexa's new bilingual Polyglot model, her career in speech research, and more.