ConversationalAI.svg
Research Area

Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Publications

View all View all
  • Interspeech 2019
    2019
    Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to generate pronunciations for out-of-vocabulary words that do not exist in the pronunciation lexicons (mappings like ”e c h o” → ”E k oU”). Most G2P systems are monolingual and based on traditional joint-sequence-based n-gram models. As an alternative, we
  • Rylan Conway, Lambert Mathias
    SIGDIAL 2019
    2019
    In a spoken-dialogue system, dialogue state tracker (DST) components track the state of the conversation by updating a distribution of values associated with each of the slots being tracked for the current user turn, using the interactions until then. Much of the previous work has relied on modeling the natural order of the conversation, using distance based offsets as an approximation of time. In this
  • Kenichi Kumatani, Wu Minhua, Shiva Sundaram, Nikko Ström, Björn Hoffmeister
    ICASSP 2019
    2019
    The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between
  • Spoken language can include sensitive topics including profanity, insults, political and offensive speech. In order to engage in contextually appropriate conversations, it is essential for voice services such as Alexa, Google Assistant, Siri, etc. to detect sensitive topics in the conversations and react appropriately. A simple approach to detect sensitive topics is to use regular expression or keyword
  • Abdalghani Abujabal, Judith Gaspers
    Interspeech 2019
    2019
    Named entity recognition (NER) is a vital task in spoken language understanding, which aims to identify mentions of named entities in text e.g., from transcribed speech. Existing neural models for NER rely mostly on dedicated word-level representations, which suffer from two main shortcomings. First, the vocabulary size is large, yielding large memory requirements and training time. Second, these models

Related content

GB, MLN, Edinburgh
We’re looking for a Machine Learning Scientist in the Personalization team for our Edinburgh office experienced in generative AI and large models. You will be responsible for developing and disseminating customer-facing personalized recommendation models. This is a hands-on role with global impact working with a team of world-class engineers and scientists across the Edinburgh offices and wider organization. You will lead the design of machine learning models that scale to very large quantities of data, and serve high-scale low-latency recommendations to all customers worldwide. You will embody scientific rigor, designing and executing experiments to demonstrate the technical efficacy and business value of your methods. You will work alongside aRead more