-
2024Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, which are essential for achieving natural, human-like spoken conversation, especially when such information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced
-
AAAI 2024 Workshop on Responsible Language Models2024In the In-Context Learning (ICL) setup, various forms of label biases can manifest. One such manifestation is majority label bias, which arises when the distribution of labeled examples in the in-context samples is skewed towards one or more specific classes making Large Language Models (LLMs) more prone to predict those labels. Such discrepancies can arise from various factors, including logistical constraints
-
NeurIPS 2023 Workshop on Robustness of Zero/Few-shot Learning in Foundation Models (R0-FoMo)2024Dealing with background noise is a challenging task in audio signal processing, negatively impacting algorithm performance and system robustness. In this paper, we propose a simple solution that combines recording hardware modification and algorithm improvement to tackle the challenge. The proposed solution could produce clean and noise-free high-quality audio recording even in noisy recording environment
-
WACV 20242024Vision-language models have been widely explored across a wide range of tasks and achieve satisfactory performance. However, it’s under-explored how to consolidate entity understanding through a varying number of images and to align it with the pre-trained language models for generative tasks. In this paper, we propose MIVC, a general multiple instance visual component to bridge the gap between various
-
NeurIPS 20232023Transformers are central in modern natural language processing and computer vision applications. Despite recent works devoted to reducing the quadratic cost of such models (as a function of the sequence length), dealing with ultra long sequences (e.g., with more than 16K tokens) remains challenging. Applications such as answering questions based on a book or summarizing a scientific article are inefficient
Related content
-
June 26, 2023How phonetically blended results (PBR) help ensure customers find the content they were actually asking for.
-
June 09, 2023In a top-3% paper at ICASSP, Amazon researchers adapt graph-based label propagation to improve speech recognition on underrepresented pronunciations.
-
June 07, 2023Team earned $500,000 for its performance in a challenge focused on advancing next-generation virtual assistants that help humans complete real-world tasks by continuously learning.
-
June 07, 2023Combining semi-supervised learning, data augmentation, and reinforcement learning using rewards based on implicit customer feedback and natural-language-understanding semantics reduces word error rate by more than 10%.
-
June 02, 2023Topics such as code generation, commonsense reasoning, and self-learning complement the usual focus on speech recognition and acoustic-event classification.
-
May 23, 2023Enforcing a hierarchical clustering of semantically related labels improves performance on rare “long-tail” classification categories.