-
NeurIPS 2023 Workshop on Robustness of Zero/Few-shot Learning in Foundation Models (R0-FoMo)2024Dealing with background noise is a challenging task in audio signal processing, negatively impacting algorithm performance and system robustness. In this paper, we propose a simple solution that combines recording hardware modification and algorithm improvement to tackle the challenge. The proposed solution could produce clean and noise-free high-quality audio recording even in noisy recording environment
-
WACV 20242024Vision-language models have been widely explored across a wide range of tasks and achieve satisfactory performance. However, it’s under-explored how to consolidate entity understanding through a varying number of images and to align it with the pre-trained language models for generative tasks. In this paper, we propose MIVC, a general multiple instance visual component to bridge the gap between various
-
NeurIPS 20232023Transformers are central in modern natural language processing and computer vision applications. Despite recent works devoted to reducing the quadratic cost of such models (as a function of the sequence length), dealing with ultra long sequences (e.g., with more than 16K tokens) remains challenging. Applications such as answering questions based on a book or summarizing a scientific article are inefficient
-
EMNLP 20232023While most task-oriented dialogues assume conversations between the agent and one user at a time, dialogue systems are increasingly expected to communicate with multiple users simultaneously who make decisions collaboratively. To facilitate development of such systems, we release the Multi-User MultiWOZ dataset: task-oriented dialogues among two users and one agent. To collect this dataset, each user utterance
-
EMNLP 20232023Studies in bias and fairness in natural language processing have primarily examined social biases within a single language and/or across few attributes (e.g. gender, race). However, biases can manifest differently across various languages for individual attributes. As a result, it is critical to examine biases within each language and attribute. Of equal importance is to study how these biases compare across
Related content
-
October 02, 2018On September 20, Amazon unveiled a host of new products and features, including Alexa Guard, a smart-home feature available on select Echo devices later this year. When activated, Alexa Guard can send a customer alerts if it detects the sound of glass breaking or of smoke or carbon monoxide alarms in the home.
-
September 28, 2018Last week, Amazon announced the release of both a redesigned Echo Show with a bigger screen and the Alexa Presentation Language, which enables third-party developers to build “multimodal” skills that coordinate Alexa’s natural-language-understanding systems with on-screen graphics.
-
September 26, 2018If you’re in a room where a child has just fallen asleep, and someone else walks in, you might start speaking in a whisper, to indicate that you’re trying to keep the room quiet. The other person will probably start whispering, too.
-
September 04, 2018A central task of natural-language-understanding systems, like the ones that power Alexa, is domain classification, or determining the general subject of a user’s utterances. Voice services must make finer-grained determinations, too, such as the particular actions that a customer wants executed. But domain classification makes those determinations much more efficient, by narrowing the range of possible interpretations.
-
August 31, 2018Echo devices have already attracted tens of millions of customers, but in the Alexa AI group, we’re constantly working to make Alexa’s speech recognition systems even more accurate.
-
August 29, 2018Alexa’s ability to act on spoken requests depends on statistical models that translate speech to text and text to actions. Historically, the models’ decisions were one-size-fits-all: the same utterance would produce the same action, regardless of context.