-
ACL 2023 Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA)2023Aspect-based Sentiment Analysis (ABSA) is a fine-grained sentiment analysis task which involves four elements from user-generated texts: aspect term, aspect category, opinion term, and sentiment polarity. Most computational approaches focus on some of the ABSA sub-tasks such as tuple (aspect term, sentiment polarity) or triplet (aspect term, opinion term, sentiment polarity) extraction using either pipeline
-
TSD 20232023Modern Automatic Speech Recognition (ASR) technology is typically fine-tuned for a targeted domain or application to obtain the best recognition results. This requires training and maintaining a dedicated ASR model for each domain, which increases the overall cost. Moreover, fine-tuned model might not be the most optimal way of sharing knowledge across domains. To address this, we propose a novel unified
-
Interspeech 20232023Conformer-based end-to-end models have become ubiquitous these days and are commonly used in both streaming and non-streaming automatic speech recognition (ASR). Techniques like dual-mode and dynamic chunk training helped unify streaming and non-streaming systems. However, there remains a performance gap between streaming with a full and limited past context. To address this issue, we propose the integration
-
Interspeech 20232023We present eCat, a novel end-to-end multi-speaker model capable of: a) generating long-context speech with expressive and contextually appropriate prosody, and b) performing fine-grained prosody transfer between any pair of seen speakers. eCat is trained using a two-stage training approach. In Stage I, the model learns speaker-independent word-level prosody representations in an end-to-end fashion from
-
Interspeech 20232023An End-to-End Speech Translation (E2E-ST) model takes input audio in one language and directly produces output text in another language. The model requires to learn both speech-to-text modality conversion and translation tasks, which demands a large architecture for effective learning of this joint task. Yet, to the best of our knowledge, we are the first to optimize compression of E2E-ST models. In this
Related content
-
October 28, 2020Knowledge distillation technique for shrinking neural networks yields relative performance increases of up to 122%.
-
October 22, 2020Director of speech recognition Shehzad Mevawalla highlights recent advances in on-device processing, speaker ID, and semi-supervised learning.
-
October 21, 2020Applications in product recommendation and natural-language processing demonstrate the approach’s flexibility and ease of use.
-
October 16, 2020New system is the first to use an attention-based sequence-to-sequence model, dispensing with separate models for features such as vibrato and phoneme durations.
-
October 15, 2020Hear Breen discuss his work leading research teams in speech synthesis and text-to-speech technologies, the science behind Alexa’s enhanced voice styles, and more.
-
October 05, 2020Challenge includes benchmark models from Amazon Alexa, which achieve state-of-the-art performance on five of the challenge tasks.