-
2024Large Language Models (LLMs) are powerful models for generation tasks, but they may not generate good quality outputs in their first attempt. Apart from model fine-tuning, existing approaches to improve prediction accuracy and quality typically involve LLM self-improvement / self-reflection that incorporate feedback from models themselves. Despite their effectiveness, these methods are hindered by their
-
ACL Findings 20242024Visual Question Answering (VQA) often involves diverse reasoning scenarios across Vision and Language (V&L). Most prior VQA studies, however, have merely focused on assessing the model’s overall accuracy without evaluating it on different reasoning cases. Furthermore, some recent works observe that conventional Chain-of-Thought (CoT) prompting fails to generate effective reasoning for VQA, especially for
-
2024Machine translation is used in e-commerce to translate second-language queries into the primary language of the store, to be matched by the search system against the product catalog. However, many queries contain spelling mistakes. We first present an analysis of the spelling-robustness of a population of MT systems, quantifying how spelling variations affect MT output, the list of returned products, and
-
The issue of popularity bias—where popular items are disproportionately recommended, overshadowing less popular but potentially relevant items—remains a significant challenge in recommender systems. Recent advancements have seen the integration of general-purpose Large Language Models (LLMs) into the architecture of such systems. This integration raises concerns that it might exacerbate popularity bias,
-
2024We introduce a new, extensive multidimensional quality metrics (MQM) annotated dataset covering 11 language pairs in the biomedical domain. We use this dataset to investigate whether machine translation (MT) metrics which are fine-tuned on human-generated MT quality judgements are robust to domain shifts between training and inference. We find that fine-tuned metrics exhibit a substantial performance drop
Related content
-
February 07, 2023Parmida Beigi, an Amazon senior research scientist, shares a lifetime worth of experience, and uses her skills to help others grow into machine learning career paths.
-
February 06, 2023Methods for controlling the outputs of large generative models and integrating symbolic reasoning with machine learning are among the conference’s hot topics.
-
January 23, 2023Two Alexa AI papers present novel methodologies that use vision and language understanding to improve embodied task completion in simulated environments.
-
January 20, 2023Prompt engineering enables researchers to generate customized training examples for lightweight “student” models.
-
January 18, 2023On natural-language-understanding tasks, student models trained only on task-specific data outperform those trained on a mix that includes generic data.
-
January 13, 2023Using lists of rare or out-of-vocabulary words to bias connectionist temporal classification models enables personalization.