-
Interspeech 20192019Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to generate pronunciations for out-of-vocabulary words that do not exist in the pronunciation lexicons (mappings like ”e c h o” → ”E k oU”). Most G2P systems are monolingual and based on traditional joint-sequence-based n-gram models. As an alternative, we
-
SIGDIAL 20192019In a spoken-dialogue system, dialogue state tracker (DST) components track the state of the conversation by updating a distribution of values associated with each of the slots being tracked for the current user turn, using the interactions until then. Much of the previous work has relied on modeling the natural order of the conversation, using distance based offsets as an approximation of time. In this
-
ICASSP 20192019The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between
-
DSAA 2019, IEEE DSAA 20192019Spoken language can include sensitive topics including profanity, insults, political and offensive speech. In order to engage in contextually appropriate conversations, it is essential for voice services such as Alexa, Google Assistant, Siri, etc. to detect sensitive topics in the conversations and react appropriately. A simple approach to detect sensitive topics is to use regular expression or keyword
-
Interspeech 20192019Named entity recognition (NER) is a vital task in spoken language understanding, which aims to identify mentions of named entities in text e.g., from transcribed speech. Existing neural models for NER rely mostly on dedicated word-level representations, which suffer from two main shortcomings. First, the vocabulary size is large, yielding large memory requirements and training time. Second, these models
Related content
-
November 19, 2020AI models exceed human performance on public data sets; modified training and testing could help ensure that they aren’t exploiting short cuts.
-
November 16, 2020Amazon Scholar Julia Hirschberg on why speech understanding and natural-language understanding are intertwined.
-
November 11, 2020With a new machine learning system, Alexa can infer that an initial question implies a subsequent request.
-
November 10, 2020Alexa senior applied scientist provides career advice to graduate students considering a research role in industry.
-
November 09, 2020Watch a recording of the EMNLP 2020 session featuring a discussion with Amazon scholars and academics on the state of conversational AI.
-
November 06, 2020Work aims to improve accuracy of models both on- and off-device.