-
EMNLP 20202020Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how to effectively
-
EMNLP 20202020Adversarial training (AT) has shown strong regularization effects on deep learning algorithms by introducing small input perturbations to improve model robustness. In language tasks, adversarial training brings word-level robustness by adding input noise, which is beneficial for text classification. However, it lacks sufficient contextual information enhancement and thus is less useful for sequence labelling
-
EMNLP 20202020We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering score,
-
EMNLP 20202020Neural machine translation achieves impressive results in high-resource conditions, but performance often suffers when the input domain is low-resource. The standard practice of adapting a separate model for each domain of interest does not scale well in practice from both a quality perspective (brittleness under domain shift) as well as a cost perspective (added maintenance and inference complexity). In
-
AAAI 2021, ISMIR 2020 Workshop on NLP for Music and Audio2020Transformers have emerged as the dominant approach in music literature for generating minute-long compositions with compelling musical structure. These models are trained by minimizing the negative log-likelihood (NLL) of the observed sequence autoregressively. Unfortunately, the quality of samples from these models tends to degrade significantly for long sequences, a phenomenon attributed to exposure bias
Related content
-
April 28, 2022The team’s latest research on privacy-preserving machine learning, federated learning, and bias mitigation.
-
April 21, 2022Amazon Scholar Eugene Agichtein on incorporating knowledge into natural-language-processing models, multimodal interactions, and more.
-
April 20, 2022MASSIVE dataset and Massively Multilingual NLU (MMNLU-22) competition and workshop will help researchers scale natural-language-understanding technology to every language on Earth.
-
April 07, 2022The JHU + Amazon Initiative for Interactive AI (AI2AI) will be housed in the Whiting School of Engineering.
-
April 04, 2022Thanks to a set of simple abstractions, models with different architectures can be integrated and optimized for particular hardware accelerators.
-
March 23, 2022Amazon researchers optimize the distributed-training tool to run efficiently on the Elastic Fabric Adapter network interface.