ConversationalAI.svg
Research Area

Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Publications

View all View all
  • ICASSP 2020
    2020
    We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approach for few-shot speaker adaptation. Here, the task is to fine-tune a pre-trained TTS model to mimic a new speaker using a small corpus of target utterances. We demonstrate that there does not exist a one-size-fits-all adaptation strategy, with convincing synthesis requiring a corpus-specific configuration
  • Nachshon Cohen, Simone Filice, David Carmel
    The Web Conference 2020
    2020
    Community Question Answering (CQA) websites, such as Stack Exchange or Quora, allow users to freely ask questions and obtain answers from other users, i.e., the community. Personal assistants, such as Amazon Alexa or Google Home, can also exploit CQA data to answer a broader range of questions and increase customers’ engagement. However, the voice-based interaction poses new challenges to the Question Answering
  • Bahareh Tolooshams, Ritwik Giri, Andrew H. Song, Umut Isik, Arvindh Krishnaswamy
    ICASSP 2020
    2020
    Supervised deep learning has gained significant attention fo rspeech enhancement recently. The state-of-the-art deep-learning methods perform the task by learning a ratio/binary mask that is applied to the mixture in the time-frequency domain to produce the clean speech. Despite the great performance in the single-channel setting, these frameworks lag in performance in the multichannel setting, as the majority
  • Subendhu Rongali, Luca Soldaini, Emilio Monti, Wael Hamza
    The Web Conference 2020
    2020
    Virtual assistants such as Amazon Alexa, Apple Siri, and Google Assistant often rely on a semantic parsing component to understand which action(s) to execute for an utterance spoken by its users. Traditionally, rule-based or statistical slot-filling systems have been used to parse “simple” queries — that is, queries that contain a single action and can be decomposed into a set of non-overlapping entities
  • Sanna Wagner, Aparna Khare, Minhua Wu, Kenichi Kumatani, Shiva Sundaram
    ICASSP 2020
    2020
    In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system. For the student, both multi-channel feature extraction layers and the

Related content

GB, MLN, Edinburgh
We’re looking for a Machine Learning Scientist in the Personalization team for our Edinburgh office experienced in generative AI and large models. You will be responsible for developing and disseminating customer-facing personalized recommendation models. This is a hands-on role with global impact working with a team of world-class engineers and scientists across the Edinburgh offices and wider organization. You will lead the design of machine learning models that scale to very large quantities of data, and serve high-scale low-latency recommendations to all customers worldwide. You will embody scientific rigor, designing and executing experiments to demonstrate the technical efficacy and business value of your methods. You will work alongside aRead more