-
ACL 20232023We present a new task setting for attribute mining on e-commerce products, serving as a practical solution to extract open-world attributes without extensive human intervention. Our supervision comes from a high-quality seed attribute set bootstrapped from existing resources, and we aim to expand the attribute vocabulary of existing seed types, and also to discover any new attribute types automatically.
-
ACL 20232023We present the MASSIVE dataset— Multilingual Amazon Slu resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation. MASSIVE contains 1M realistic, parallel, labeled virtual assistant utterances spanning 51 languages, 18 domains, 60 intents, and 55 slots. MASSIVE was created by tasking professional translators to localize the English-only SLURP dataset into 50 typologically
-
ICML 2023 Workshop on Sampling and Optimization in Discrete Spaces2023Recent developments in natural language processing (NLP) have highlighted the need for substantial amounts of data for models to capture textual information accurately. This raises concerns regarding the computational resources and time required for training such models. This paper introduces SEmantics for data SAliency in Model performance Estimation (SeSaME). It is an efficient data sampling mechanism
-
ACL Findings 20232023Given a data lake of tabular data as well as a query table, how can we retrieve all the tables in the data lake that can be unioned with the query table? Table union search constitutes an essential task in data discovery and preparation as it enables data scientists to navigate massive open data repositories. Existing methods identify uniability based on column representations (word surface forms or token
-
ACL 20232023Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance. However, libraries are upgraded or deprecated very frequently and re-training large-scale language models is computationally expensive. Therefore, Continual Learning (CL) is an important aspect that remains under-explored in the code domain. In this paper, we introduce a benchmark called CODETASK-CL that covers
Related content
-
Projection image adapted from Michael Horvath under the CC BY-SA 4.0 licenseJanuary 15, 2019Neural networks have been responsible for most of the top-performing AI systems of the past decade, but they tend to be big, which means they tend to be slow. That’s a problem for systems like Alexa, which depend on neural networks to process spoken requests in real time.
-
December 21, 2018In May 2018, Amazon launched Alexa’s Remember This feature, which enables customers to store “memories” (“Alexa, remember that I took Ben’s watch to the repair store”) and recall them later by asking open-ended questions (“Alexa, where is Ben’s watch?”).
-
December 18, 2018At a recent press event on Alexa's latest features, Alexa’s head scientist, Rohit Prasad, mentioned multistep requests in one shot, a capability that allows you to ask Alexa to do multiple things at once. For example, you might say, “Alexa, add bananas, peanut butter, and paper towels to my shopping list.” Alexa should intelligently figure out that “peanut butter” and “paper towels” name two items, not four, and that bananas are a separate item.
-
December 17, 2018In recent years, data representation has emerged as an important research topic within machine learning.
-
December 13, 2018Language models are a key component of automatic speech recognition systems, which convert speech into text. A language model captures the statistical likelihood of any particular string of words, so it can help decide between different interpretations of the same sequence of sounds.
-
December 11, 2018Suppose that you say to Alexa, “Alexa, play Mary Poppins.” Alexa must decide whether you mean the book, the video, or the soundtrack. How should she do it?