-
2023 ISCA SPSC Symposium2023Federated Learning (FL) offers a privacy-preserving approach to model training, allowing edge devices to learn collaboratively without sharing data. Edge devices like Alexa and Siri are prospective sources of unlabeled audio data that can be tapped to learn robust audio representations. In this work, we bring Self-supervised Learning (SSL) and FL together to learn representations for Automatic Speech Recognition
-
Voice conversion for Lombard speaking style with implicit and explicit acoustic feature conditioningInterspeech 2023 Workshop on Machine Learning Challenges for Hearing Aids2023Text-to-Speech (TTS) systems in Lombard speaking style can im-prove the overall intelligibility of speech, useful for hearing loss and noisy conditions. However, training those models requires a large amount of data and the Lombard effect is challenging to record due to speaker and noise variability and tiring recording conditions. Voice conversion (VC) has been shown to be a useful augmentation technique
-
CIKM 20232023In e-commerce sites, customer questions on the product detail page express the customers’ information needs about the product. The answers to these questions often provide the necessary information. In this work, we present and address the novel task of generating product insights from community questions and answers (Q&A). These insights can be presented to customers to assist them in their shopping journey
-
SIGIR 20232023Conversation disentanglement aims to identify and group utterances from a conversation into separate threads. Existing methods in the literature primarily focus on disentangling multi-party conversations involving three or more speakers, which enables their models to explicitly or implicitly incorporate speaker-related feature signals while disentangling. Most existing models require a large amount of human
-
ACL 20232023Methods to generate text from structured data have advanced significantly in recent years, primarily due to fine-tuning of pre-trained lan-guage models on large datasets. However, such models can fail to produce output faithful to the input data, particularly on out-of-domain data. Sufficient annotated data is often not avail-able for specific domains, leading us to seek an unsupervised approach to improve
Related content
-
May 12, 2020Users find speech with transferred expression 9% more natural than standard synthesized speech.
-
May 07, 2020Watch the recording of Manoj Sindhwani's live interview with Alexa evangelist Jeff Blankenburg.
-
May 06, 2020Leveraging semantic content improves performance of acoustic-only model for detecting device-directed speech.
-
May 04, 2020Alexa scientist Ariya Rastrow on the blurring boundaries between acoustic processing and language understanding.
-
April 30, 2020Letting a machine learning system label its own examples improves performance.
-
April 27, 2020An end-to-end deep-learning-based solution circumvents the “permutation problem”.