-
ACL 20232023We present the MASSIVE dataset— Multilingual Amazon Slu resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation. MASSIVE contains 1M realistic, parallel, labeled virtual assistant utterances spanning 51 languages, 18 domains, 60 intents, and 55 slots. MASSIVE was created by tasking professional translators to localize the English-only SLURP dataset into 50 typologically
-
ICML 2023 Workshop on Sampling and Optimization in Discrete Spaces2023Recent developments in natural language processing (NLP) have highlighted the need for substantial amounts of data for models to capture textual information accurately. This raises concerns regarding the computational resources and time required for training such models. This paper introduces SEmantics for data SAliency in Model performance Estimation (SeSaME). It is an efficient data sampling mechanism
-
ACL Findings 20232023Given a data lake of tabular data as well as a query table, how can we retrieve all the tables in the data lake that can be unioned with the query table? Table union search constitutes an essential task in data discovery and preparation as it enables data scientists to navigate massive open data repositories. Existing methods identify uniability based on column representations (word surface forms or token
-
ACL 20232023Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance. However, libraries are upgraded or deprecated very frequently and re-training large-scale language models is computationally expensive. Therefore, Continual Learning (CL) is an important aspect that remains under-explored in the code domain. In this paper, we introduce a benchmark called CODETASK-CL that covers
-
ACL Findings 20232023Sentiment analysis (SA) systems are used in many products and hundreds of languages. Gender and racial biases are well-studied in English SA systems, but understudied in other languages, with few resources for such studies. To remedy this, we build a counterfactual evaluation corpus for gender and racial/migrant bias in four languages. We demonstrate its usefulness by answering a simple but important question
Related content
-
September 04, 2019Earlier this year, at Amazon’s re:MARS conference, Alexa head scientist Rohit Prasad unveiled Alexa Conversations, a new service that allows Alexa skill developers to more easily integrate conversational elements into their skills. The announcement is an indicator of the next stage in Alexa’s evolution: more-natural, dialogue-based engagements that enable Alexa to aggregate data and refine requests to better meet customer needs.
-
August 29, 2019An automatic-speech-recognition system — such as Alexa’s — converts speech into text, and one of its key components is its language model. Given a sequence of words, the language model computes the probability that any given word is the next one. For instance, a language model would predict that a sentence that begins “Toni Morrison won the Nobel” is more likely to conclude “Prize” than “dries”. Language models can thus help decide between competing interpretations of the same acoustic information.
-
August 22, 2019A text-to-speech system, which converts written text into synthesized speech, is what allows Alexa to respond verbally to requests or commands...
-
Animation by Nick LittleAugust 15, 2019Embedding entity names from diverse skills in a shared representations space enables system to suggest neglected entity names with 88.5% accuracy.
-
August 13, 2019Neural networks are responsible for most recent advances in artificial intelligence, including many of Alexa’s latest capabilities. But neural networks tend to be large and unwieldy, and in recent years, the Alexa team has been investigating techniques for making them efficient enough to run on-device.
-
August 08, 2019Alexa currently has more than 90,000 skills, or abilities contributed by third-party developers — the Uber ride-sharing skill, the Jeopardy! trivia game skill, the Starbucks drink-ordering skill, and so on.