A quick guide to Amazon’s 50-plus papers at EMNLP 2024

Large language models predominate, both as a research subject themselves and as tools for researching topics of particular interest to Amazon, such as speech, recommendations, and information retrieval.

Large language models (LLMs) have come to dominate the field of natural-language processing, so it’s no surprise that they also dominate the research that Amazon scientists are presenting at this year’s Conference on Empirical Methods in Natural-Language Processing (EMNLP). LLM training is the topic with the greatest number of Amazon papers, followed closely by strategies for mitigating misinformation in LLMs’ outputs — including but not limited to hallucinations. At the same time, a number of papers apply LLMs to topics of traditional interest at Amazon, such as speech, recommender systems, and information retrieval. (Papers marked with asterisks were accepted to Findings of EMNLP.)

AI agents

MARCO: Multi-agent real-time chat orchestration
Anubhav Shrimal, Shervin Malmasi, Kriti Biswas, Swarnalatha Raghuraman, Anish Nediyanchath, Yi Zhang, Promod Yenigalla

Code generation

CodeFort: Robust training for code generation models
Yuhao Zhang, Shiqi Wang, Haifeng Qian, Zijian Wang, Mingyue Shang, Linbo Liu, Sanjay Krishna Gouda, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras

Socratic human feedback (SoHF): Expert steering strategies for LLM code generation
Subramanian Chidambaram, Erran Li, Min Bai, Xiaopeng LI, Kaixiang Lin, Xiong Zhou, Alex C. Williams

Structured object language modeling (SoLM): Native structured objects generation conforming to complex schemas with self-supervised denoising
Amir Tavanaei, Kee Kiat Koo, Hayreddin Ceker, Shaobai Jiang, Qi Li, Julien Han, Karim Bouyarmane

Contrastive decoding

Explaining and improving contrastive decoding by extrapolating the probabilities of a huge and hypothetical LM
Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, Tagyoung Chung

Explaining and improving contrastive decoding by extrapolating the probabilities of a huge and hypothetical LM.png
Given a simple question with clues, contrastive decoding could have an “obvious blindness” (e.g., assigning higher probability to an uncommon answer, such as "invertebrate", than to the most obvious answer, "bees"). In contrast, the asymptotic probability decoding proposed in "Explaining and improving contrastive decoding by extrapolating the probabilities of a huge and hypothetical LM" correctly assigns the highest probability to "bees" by leveraging the probabilities from multiple LMs of different sizes.

Data integration

ASTRA: Automatic schema matching using machine translation
Tarang Chugh, Deepak Zambre

Learning from natural language explanations for generalizable entity matching
Somin Wadhwa, Adit Krishnan, Runhui Wang, Byron C. Wallace, Chris (Luyang) Kong

Pretraining and finetuning language models on geospatial networks for accurate address matching
Saket Maheshwary, Arpan Paul, Saurabh Sohoney

Retrieval augmented spelling correction for e-commerce applications
Xuan Guo, Rohit Patki, Dante Everaert, Christopher Potts

Dataset distillation

Textual dataset distillation via language model embedding
Yefan Tao, Chris (Luyang) Kong, Andrey Kan, Laurent Callot

Textual dataset distillation via language model embedding: DaLLME.png
The DaLLME framework proposed in "Textual dataset distillation via language model embedding" begins by using a language model to transform raw textual data into embedding vectors. A set of distilled vectors is then derived in the embedding space, through a process designed to encapsulate maximum informational content. Finally, the vec2text model translates these distilled vectors back into textual form.

Document understanding

DocKD: Knowledge distillation from LLMs for open-world document understanding models
Sungnyun Kim, Haofu Liao, Srikar Appalaraju, Peng Tang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan, Stefano Soatto

Information retrieval

Evaluating D-MERIT of partial-annotation on information retrieval
Royi Rassin, Yaron Fairstein, Oren Kalinsky, Guy Kushilevitz, Nachshon Cohen, Alexander Libov, Yoav Goldberg

Identifying high consideration e-commerce search queries
Zhiyu Chen, Jason Choi, Besnik Fetahu, Shervin Malmasi

Learning when to retrieve, what to rewrite, and how to respond in conversational QA*
Nirmal Roy, Leonardo Ribeiro, Rexhina Blloshmi, Kevin Small

Natural-language understanding

Intent detection in the age of LLMs
Gaurav Arora, Shreya Jain, Srujana Merugu

Intent detection in the age of LLMs.png
"Intent detection in the age of LLMs" proposes a methodology for adaptive in-context learning and chain-of-thought-based intent detection using LLMs.

Predicting entity salience in extremely short documents
Ben Bullough, Harrison Lundberg, Chen Hu, Weihang Xiao

LLM evaluation

AXCEL: Automated eXplainable consistency evaluation using LLMs*
P Aditya Sreekar, Sahil Verma, Suransh Chopra, Sarik Ghazarian, Abhishek Persad, Narayanan Sadagopan

Precise model benchmarking with only a few observations
Riccardo Fogliato, Pratik Patil, Nil-Jana Akpinar, Mathew Monfort

LLM fine tuning

AdaZeta: Adaptive zeroth-order tensor-train adaption for memory-efficient large language models fine-tuning
Yifan Yang, Kai Zhen, Ershad Banijamali, Thanasis Mouchtaris, Zheng Zhang

RoseLoRA: Row and column-wise sparse low-rank adaptation of pre-trained language model for knowledge editing and fine-tuning
Haoyu Wang, Tianci Liu, Ruirui Li, Monica Cheng, Tuo Zhao, Jing Gao

RoseLoRA.png
The row- and column-wise sparse low-rank adaptation (RoseLoRA) framework proposed in "RoseLoRA: Row and column-wise sparse low-rank adaptation of pre-trained language model for knowledge editing and fine-tuning".

LLMs for speech

Speechworthy instruction-tuned language models
Hyundong Cho, Nicolaas Jedema, Leonardo Ribeiro, Karishma Sharma, Pedro Szekely, Alessandro Moschitti, Ruben Janssen, Jonathan May

LLM misinformation mitigation

ECON: On the detection and resolution of evidence conflicts
Cheng Jiayang, Chunkit Chan, Qianqian Zhuang, Lin Qiu, Tianhang Zhang, Tengxiao Liu, Yangqiu Song, Yue Zhang, Pengfei Liu, Zheng Zhang

Generative subgraph retrieval for knowledge graph–grounded dialog generation
Jinyoung Park, Minseok Joo, Joo-Kyung Kim, Hyunwoo J. Kim

HalluMeasure: Fine-grained hallucination measurement using chain-of-thought reasoning
Shayan Ali Akbar, Md Mosharaf Hossain, Tess Wood, Si-Chi Chin, Erica Salinas, Victor Alvarez, Erwin Cornejo

Knowledge-centric hallucination detection
Xiangkun Hu, Dongyu Ru, Lin Qiu, Qipeng Guo, Tianhang Zhang, Yang Xu, Yun Luo, Pengfei Liu, Zheng Zhang, Yue Zhang

LLM reasoning

Auto-evolve: Enhancing large language model’s performance via self-reasoning framework*
Krishna Aswani, Alex Lu, Pranav Patankar, Priya Dhalwani, Iris Tan, Jayant Ganeshmohan, Simon Lacasse

LLM self-correction

LLM self-correction with DeCRIM: Decompose, critique, and refine for enhanced following of instructions with multiple constraints
Thomas Palmeira Ferraz, Kartik Mehta, Yu-Hsiang Lin, Haw-Shiuan Chang, Shereen Oraby, Sijia Liu, Vivek Subramanian, Tagyoung Chung, Mohit Bansal, Nanyun Peng

DeCRIM.png
In the DeCRIM pipeline proposed in "LLM self-correction with DeCRIM: Decompose, critique, and refine for enhanced following of instructions with multiple constraints", an LLM first generates a response to a user request. The Decomposer then breaks down the request into granular constraints, and the Critic model gives feedback on whether the response meets those constraints. If it does, the response is output; if not, the LLM uses the feedback to refine the response.

LLM training

Dancing in chains: Reconciling instruction following and faithfulness in language models
Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang

DEM: Distribution edited model for training with mixed data distributions
Dhananjay Ram, Aditya Rawal, Momchil Hardalov, Nikolaos Pappas, Sheng Zha

DEM: Distribution Edited Model for Training with Mixed Data Distributions
The distribution-edited model D) described in "DEM: Distribution edited model for training with mixed data distributions" results from fine-tuning a pretrained model (Θ) on n individual data distributions (Di) and combining the resulting models with basic element-wise vector operations. Here, the extracted distribution vectors (∆ΘDi ) are multiplied by weight coefficients, and the weighted sum is added to the base model.

Evolutionary contrastive distillation for language model alignment
Julian Katz-Samuels, Zheng Li, Hyokun Yun, Priyanka Nigam, Yi Xu, Vaclav Petricek, Bing Yin, Trishul Chilimbi

Hop, skip, jump to convergence: Dynamics of learning rate transitions for improved training of large language models
Shreyas Subramanian, Vignesh Ganapathiraman, Corey Barrett

Learning from relevant subgoals in successful dialogs using iterative training for task-oriented dialog systems
Magdalena Kaiser, Patrick Ernst, Gyuri Szarvas

Quality matters: Evaluating synthetic data for tool-using LLMs
Shadi Iskander, Nachshon Cohen, Zohar Karnin, Ori Shapira, Sofia Tolmach

Query autocompletion

AmazonQAC: A large-scale, naturalistic query autocomplete dataset
Dante Everaert, Rohit Patki, Tianqi Zheng, Christopher Potts

DiAL: Diversity aware listwise ranking for query auto-complete
Sonali Singh, Sachin Farfade, Prakash Mandayam Comar

Question answering

RAG-QA arena: Evaluating domain robustness for long-form retrieval-augmented question answering
Rujun Han, Yuhao Zhang, Peng Qi, Yumo Xu, Jenyuan Wang, Lan Liu, William Yang Wang, Bonan Min, Vittorio Castelli

Retrieving contextual information for long-form question answering using weak supervision
Philipp Christmann, Svitlana Vakulenko, Ionut Teodor Sorodoc, Bill Byrne, Adrià de Gispert

Recommender systems

Efficient pointwise-pairwise learning-to-rank for news recommendation
Nithish Kannen Senthilkumar, Yao Ma, Gerrit van den Burg, Jean Baptiste Faddoul

Efficient pointwise-pairwise learning-to-rank for news recommendation.png
An illustration of the GLIMPSE framework proposed in "Efficient pointwise-pairwise learning-to-rank for news recommendation". GLIMPSE adopts a multitask approach in which a pretrained language model is fine-tuned on both the relevance prediction task and the pairwise-preference task. During inference, the relevance predictions are used to produce an initial pointwise ranking, which is subsequently improved by one or more right-to-left (RTL) passes using pairwise comparisons.

PEARL: Preference extraction with exemplar augmentation and retrieval with LLM agents
Vijit Malik, Akshay Jagatap, Vinayak Puranik, Anirban Majumder

Sequential LLM framework for fashion recommendation
Han Liu, Xianfeng Tang, Tianlang Chen, Jiapeng Liu, Indu Indu, Henry Peng Zou, Peng Dai, Roberto Fernandez Galan, Mike Porter, Dongmei Jia, Ning Zhang, Lian Xiong

Responsible AI

Attribute controlled fine-tuning for large language models: A case study on detoxification
Tao Meng, Ninareh Mehrabi, Palash Goyal, Anil Ramakrishna, Aram Galstyan, Richard Zemel, Kai-Wei Chang, Rahul Gupta, Charith Peris

FLIRT: Feedback loop in-context red teaming
Ninareh Mehrabi, Palash Goyal, Christophe Dupuy, Qian Hu, Shalini Ghosh, Richard Zemel, Kai-Wei Chang, Aram Galstyan, Rahul Gupta

Order of magnitude speedups for LLM membership inference
Rongting Zhang, Martin Bertran Lopez, Aaron Roth

Synthetic data generation

CorrSynth: A correlated sampling method for diverse dataset generation from LLMs
Suhas Kowshik, Abhishek Divekar, Vijit Malik

A Correlated Sampling Method for Diverse Dataset Generation from LLMs
"CorrSynth: A correlated sampling method for diverse dataset generation from LLMs" introduces a sampling method that uses anti-correlation between examples rather than few-shot generation.

DATA ADVISOR: Dynamic data curation for safety alignment of large language models
Fei Wang, Ninareh Mehrabi, Palash Goyal, Rahul Gupta, Kai-Wei Chang, Aram Galstyan

Evaluating differentially private synthetic data generation in high-stakes domains
Krithika Ramesh, Nupoor Gandhi, Pulkit Madaan, Lisa Bauer, Charith Peris, Anjalie Field

SYNTHESIZRR: Generating diverse datasets with retrieval augmentation
Abhishek Divekar, Greg Durrett

Abstract depiction of the SYNTHESIZRR procedure
Abstract depiction of the procedure proposed in "SYNTHESIZRR: Generating diverse datasets with retrieval augmentation". The content sourcing stage retrieves K unique documents {r1,...,rK} from a large corpus for each in-context covariate xICL. The task-inversion stage uses a parameterized context refinement prompt, Pτ, which takes parameters Rinv (inversion instruction), rk (a retrieved document), and V(yICL) (the verbalized target label). A generalist teacher LLM autoregressively generates a synthetic covariate. Each in-context example thus produces K unique synthetic examples {x̃1,..., x̃K}, which we include in the dataset with target yICL.

Text classification

Distance-aware calibration for pre-trained language models*
Alberto Gasparin, Gianluca Detommaso

Performance-guided LLM knowledge distillation for efficient text classification at scale

Flavio Di Palo, Prateek Singhi, Bilal Fadlallah

Prompt-tuned muti-task taxonomic transformer (PTMTTaxoFormer)
Rajashekar Vasantha, Nhan Nguyen, Yue Zhang

Text summarization

Salient information prompting to steer content in prompt-based abstractive summarization
Lei Xu, Asad Karim, Saket Dingliwal, Aparna Elangovan

Research areas

Related content

US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
GB, London
As a STRUC Economist Intern, you'll specialize in structural econometric analysis to estimate fundamental preferences and strategic effects in complex business environments. Your responsibilities include: Analyze large-scale datasets using structural econometric techniques to solve complex business challenges Applying discrete choice models and methods, including logistic regression family models (such as BLP, nested logit) and models with alternative distributional assumptions Utilizing advanced structural methods including dynamic models of customer or firm decisions over time, applied game theory (entry and exit of firms), auction models, and labor market models Building datasets and performing data analysis at scale Collaborating with economists, scientists, and business leaders to develop data-driven insights and strategic recommendations Tackling diverse challenges including pricing analysis, competition modeling, strategic behavior estimation, contract design, and marketing strategy optimization Helping business partners formalize and estimate business objectives to drive optimal decision-making and customer value Build and refine comprehensive datasets for in-depth structural economic analysis Present complex analytical findings to business leaders and stakeholders
US, WA, Seattle
At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying experience for customers worldwide so they can find, discover, and buy any product they want. We innovate on behalf of our customers to ensure uniqueness and consistency of product identity and to infer relationships between products in Amazon Catalog to drive the selection gateway for the search and browse experiences on the website. We're solving a fundamental AI challenge: establishing product identity and relationships at unprecedented scale. Using Generative AI, Visual Language Models (VLMs), and multimodal reasoning, we determine what makes each product unique and how products relate to one another across Amazon's catalog. The scale is staggering: billions of products, petabytes of multimodal data, millions of sellers, dozens of languages, and infinite product diversity—from electronics to groceries to digital content. The research challenges are immense. GenAI and VLMs hold transformative promise for catalog understanding, but we operate where traditional methods fail: ambiguous problem spaces, incomplete and noisy data, inherent uncertainty, reasoning across both images and textual data, and explaining decisions at scale. Establishing product identities and groupings requires sophisticated models that reason across text, images, and structured data—while maintaining accuracy and trust for high-stakes business decisions affecting millions of customers daily. Amazon's Item and Relationship Platform group is looking for an innovative and customer-focused applied scientist to help us make the world's best product catalog even better. In this role, you will partner with technology and business leaders to build new state-of-the-art algorithms, models, and services to infer product-to-product relationships that matter to our customers. You will pioneer advanced GenAI solutions that power next-generation agentic shopping experiences, working in a collaborative environment where you can experiment with massive data from the world's largest product catalog, tackle problems at the frontier of AI research, rapidly implement and deploy your algorithmic ideas at scale, across millions of customers. Key job responsibilities Key job responsibilities include: * Formulate novel research problems at the intersection of GenAI, multimodal learning, and large-scale information retrieval—translating ambiguous business challenges into tractable scientific frameworks * Design and implement leading models leveraging VLMs, foundation models, and agentic architectures to solve product identity, relationship inference, and catalog understanding at billion-product scale * Pioneer explainable AI methodologies that balance model performance with scalability requirements for production systems impacting millions of daily customer decisions * Design and execute model distillation strategies—distilling large frontier LLMs and VLMs into compact, production-grade models—that preserve multimodal reasoning capability while dramatically reducing serving latency, cost, and infrastructure footprint at billion-product catalog scale * Own end-to-end ML pipelines from research ideation to production deployment—processing petabytes of multimodal data with rigorous evaluation frameworks * Define research roadmaps aligned with business priorities, balancing foundational research with incremental product improvements * Mentor peer scientists and engineers on advanced ML techniques, experimental design, and scientific rigor—building organizational capability in GenAI and multimodal AI * Represent the team in the broader science community—publishing findings, delivering tech talks, and staying at the forefront of GenAI, VLM, and agentic system research