EMNLP: Mitigating bias and "getting closer to the user"

Amazon's Georgiana Dinu on current challenges in machine translation.

In recent years, mitigating bias in machine learning models has become a major topic of research, and that’s as true in natural-language processing as in any other field.

Georgiana Dinu.cropped.png
Georgiana Dinu, an applied scientist with Amazon Web Services and an area chair at this year’s Conference on Empirical Methods in Natural Language Processing.

“I think it's finally becoming obvious how important it is to deal with this, and I'm very happy to see it,” says Georgiana Dinu, an applied scientist with Amazon Web Services and an area chair at this year’s Conference on Empirical Methods in Natural Language Processing (EMNLP), which starts next week. “Leaving aside the fact that we have a duty to quantify and address bias, the problems are really difficult and fascinating in themselves.”

Dinu’s own area of research is machine translation, where, she says, the problem of quantifying bias is particularly acute. “It's incredibly difficult, because there are multiple acceptable translations for an input, and it's not easy to identify when a translation is biased or it's a variation,” she says.

One clear area of bias in machine translation, however, is gender stereotyping when translating from a language with ungendered nouns to one with gendered nouns. 

“An example of this is ‘My friend is a nurse,’” Dinu explains. “Stereotyping comes in when ‘nurse’ gets translated as female, but on the other hand, in ‘My friend is a doctor,’ ‘doctor’ is translated as male.

“At least one of the causes of this is imbalance in the training data. In machine translation, we use parallel sentences as training data, and that training data is very imbalanced with respect to gender. In Europarl, which is one of the most-used parallel corpora, only 30% of the data has female speakers. Other public datasets have close to three times as much masculine-specific data as feminine-specific data.”

But while gender bias in translation is a known problem, it can be difficult to resolve in specific cases, Dinu explains.

“We sometimes have input that is ambiguous in the source language,” Dinu explains. “For these scenarios, without any additional information, we simply can't know the correct translation. In these cases, our space of solutions becomes different. We could try to rephrase the translation such that it underspecifies the gender, but that might just be impossible in some cases. Other options are to disambiguate such a sentence by context, if the customer provides it. If it's a conversation, we might be able to infer the gender of the person. Or we could expand translation to allow customers to tell us the desired gender in ambiguous cases.”

Anti-stereotypical translation

Even in unambiguous cases, however, translation models may be so biased that they still produce erroneous translations.

“Models go to great lengths just to avoid generating anti-stereotypical outputs,” Dinu says. “It's really unbelievable, sometimes, what we see. If you try to translate a sentence such as ‘My sister takes pride in being a great surgeon,’ in certain languages, the model will change the meaning of the sentence to basically mean, ‘My sister takes great pride in me, a man, being a great surgeon.’ In other cases, it simply generates ungrammatical output, where ‘surgeon’ is male.”

At EMNLP, Dinu and her colleagues are presenting a paper that tackles exactly this problem.

Mitigating bias.cropped.png
A schematic of Dinu and her colleagues' procedure for mitigating gender bias in neural machine translation models, from "GFST: Gender-filtered self-training for more accurate gender in translation".

“We basically proposed data augmentation as a solution to address imbalances in the training data,” she says. “What is particularly nice about our approach is that we are using only monolingual data. It's a self-training approach, where the models themselves translate more feminine-gendered data. We have a step to remove sentences that are translated wrongly, and the resulting data is added to the training data to create more balance. In a couple of public data sets, this improved accuracy in feminine-referring sentences without degradation in masculine gender accuracy.

But even for this relatively narrow subset of machine translation problems, Dinu says, much work remains to be done.

“In this work, we consider only two genders, masculine and feminine,” she says. “Obviously, we need to expand this to other underrepresented genders. And there are other types of bias that can occur in machine translation. There are many forms of representational bias, where, basically, you have lower quality for one group, in a protected class, versus another group. For example, if we have a sentence such as ‘She met her spouse while giving her French lessons’, you want that to be translated just as accurately as ‘She met her spouse while giving him French lessons.’

“Another topic is just generating denigrating and offensive language in translation. In general, we have a long way to go, because biases are expressed so diversely in language.”

Getting closer to the user

For the Conference on Machine Translation, which is a two-day EMNLP workshop, Dinu helped organize a shared task on “translation using terminologies”, in which a machine translation engine has access to a database of preferred translations for particular terms.

“To give an example, if you have in English something like ‘order’ in the retail domain, you would have a customer saying that they want that to be translated as ‘commande’ in French.

“These might vary from customer to customer,” Dinu says. “They might change every year, so they have a very dynamic nature. An established task in translation is how to make machine translation models comply with these terminologies. 

“We released data for five language pairs in the medical domain, and we invited participants to submit models for this task. We received 43 submissions from 19 teams total.

“The solution space for this task has changed in recent years. We're finally making use of the power of machine learning models and have models that don't just translate but can also apply 'instructions' on how to translate certain phrases. So just to give an example, normally in machine translation, the input is just a sentence in English that needs to be translated in French. But now the input is a sentence with an annotation, indicating how to translate a certain term in that sentence, which is something you can retrieve automatically from your terminology database. And neural networks are so powerful that they can just learn this behavior. They learn to translate but also to apply terminology constraints.

“In machine translation, we're seeing more and more the need to do things beyond translation. For example, text that contains HTML markup. Say you have an input sentence with, let's say, bold markup from an html page. What you have here is not a simple translation task but the task of translating and correctly transferring the markup from the source into the target. Or maybe you're translating a table in a document, and you want the translated text to fit in the table.

“Ultimately, it's just getting closer to the user of the translation technology. It's really just bridging the gap between translation in its simplest form, which is what we had to address first, and what the users actually need, which is often translation integrated with something else.”

Research areas

Related content

US, WA, Bellevue
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
GB, London
As a STRUC Economist Intern, you'll specialize in structural econometric analysis to estimate fundamental preferences and strategic effects in complex business environments. Your responsibilities include: Analyze large-scale datasets using structural econometric techniques to solve complex business challenges Applying discrete choice models and methods, including logistic regression family models (such as BLP, nested logit) and models with alternative distributional assumptions Utilizing advanced structural methods including dynamic models of customer or firm decisions over time, applied game theory (entry and exit of firms), auction models, and labor market models Building datasets and performing data analysis at scale Collaborating with economists, scientists, and business leaders to develop data-driven insights and strategic recommendations Tackling diverse challenges including pricing analysis, competition modeling, strategic behavior estimation, contract design, and marketing strategy optimization Helping business partners formalize and estimate business objectives to drive optimal decision-making and customer value Build and refine comprehensive datasets for in-depth structural economic analysis Present complex analytical findings to business leaders and stakeholders
US, WA, Seattle
At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying experience for customers worldwide so they can find, discover, and buy any product they want. We innovate on behalf of our customers to ensure uniqueness and consistency of product identity and to infer relationships between products in Amazon Catalog to drive the selection gateway for the search and browse experiences on the website. We're solving a fundamental AI challenge: establishing product identity and relationships at unprecedented scale. Using Generative AI, Visual Language Models (VLMs), and multimodal reasoning, we determine what makes each product unique and how products relate to one another across Amazon's catalog. The scale is staggering: billions of products, petabytes of multimodal data, millions of sellers, dozens of languages, and infinite product diversity—from electronics to groceries to digital content. The research challenges are immense. GenAI and VLMs hold transformative promise for catalog understanding, but we operate where traditional methods fail: ambiguous problem spaces, incomplete and noisy data, inherent uncertainty, reasoning across both images and textual data, and explaining decisions at scale. Establishing product identities and groupings requires sophisticated models that reason across text, images, and structured data—while maintaining accuracy and trust for high-stakes business decisions affecting millions of customers daily. Amazon's Item and Relationship Platform group is looking for an innovative and customer-focused applied scientist to help us make the world's best product catalog even better. In this role, you will partner with technology and business leaders to build new state-of-the-art algorithms, models, and services to infer product-to-product relationships that matter to our customers. You will pioneer advanced GenAI solutions that power next-generation agentic shopping experiences, working in a collaborative environment where you can experiment with massive data from the world's largest product catalog, tackle problems at the frontier of AI research, rapidly implement and deploy your algorithmic ideas at scale, across millions of customers. Key job responsibilities Key job responsibilities include: * Formulate open research problems at the intersection of GenAI, multimodal reasoning, and large-scale information retrieval—defining the scientific questions that transform ambiguous, real-world catalog challenges into publishable, high-impact research * Push the boundaries of VLMs, foundation models, and agentic architectures by designing novel approaches to product identity, relationship inference, and catalog understanding—where the problem complexity (billions of products, multimodal signals, inherent ambiguity) demands methods that don't yet exist * Advance the science of efficient model deployment—developing distillation, compression, and LLM/VLM serving optimization strategies that preserve frontier-level multimodal reasoning in compact, production-grade architectures while dramatically reducing latency, cost, and infrastructure footprint at billion-product scale * Make frontier models reliable—advancing uncertainty calibration, confidence estimation, and interpretability methods so that frontier-scale GenAI systems can be trusted for autonomous catalog decisions impacting millions of customers daily * Own the full research lifecycle from problem formulation through production deployment—designing rigorous experiments over petabytes of multimodal data, iterating on ideas rapidly, and seeing your research directly improve the shopping experience for hundreds of millions of customers * Shape the team's research vision by defining technical roadmaps that balance foundational scientific inquiry with measurable product impact * Mentor scientists and engineers on advanced ML techniques, experimental design, and scientific rigor—building deep organizational capability in GenAI and multimodal AI * Represent the team in the broader science community—publishing findings, delivering tech talks, and staying at the forefront of GenAI, VLM, and agentic system research