Bringing the power of deep learning to data in tables

Amazon’s TabTransformer model is now available through SageMaker JumpStart and the official release of the Keras open-source library.

In recent years, deep neural networks have been responsible for most top-performing AI systems. In particular, natural-language processing (NLP) applications are generally built atop Transformer-based language models such as BERT.

One exception to the deep-learning revolution has been applications that rely on data stored in tables, where machine learning approaches based on decision trees have tended to work better.

At Amazon Web Services, we have been working to extend Transformers from NLP to table data with TabTransformer, a novel, deep, tabular, data-modeling architecture for supervised and semi-supervised learning.

Related content
Novel pretraining method enables increases of 5% to 14% on five different evaluation metrics.

Starting today, TabTransformer is available through Amazon SageMaker JumpStart, where it can be used for both classification and regression tasks. TabTransformer can be accessed through the SageMaker JumpStart UI inside of SageMaker Studio or through Python code using SageMaker Python SDK. To get started with TabTransformer on SageMaker JumpStart, please refer to the program documentation.

We are also thrilled to see that TabTransformer has gained attention from people across industries: it has been incorporated into the official repository of Keras, a popular open-source software library for working with deep neural networks, and it has featured in posts on Towards Data Science and Medium. We also presented a paper on the work at the ICLR 2021 Workshop on Weakly Supervised Learning.

The TabTransformer solution

TabTransformer uses Transformers to generate robust data representations — embeddings — for categorical variables, or variables that take on a finite set of discrete values, such as months of the year. Continuous variables (such as numerical values) are processed in a parallel stream.

We exploit a successful methodology from NLP in which a model is pretrained on unlabeled data, to learn a general embedding scheme, then fine-tuned on labeled data, to learn a particular task. We find that this approach increases the accuracy of TabTransformer, too.

In experiments on 15 publicly available datasets, we show that TabTransformer outperforms the state-of-the-art deep-learning methods for tabular data by at least 1.0% on mean AUC, the area under the receiver-operating curve that plots false-positive rate against false-negative rate. We also show that it matches the performance of tree-based ensemble models.

Related content
The Amazon-sponsored FEVEROUS dataset and shared task challenge researchers to create more advanced fact-checking systems.

In the semi-supervised setting, when labeled data is scarce, DNNs generally outperform decision-tree-based models, because they are better able to take advantage of unlabeled data. In our semi-supervised experiments, all of the DNNs outperformed decision trees, but with our novel unsupervised pre-training procedure, TabTransformer demonstrated an average 2.1% AUC lift over the strongest DNN benchmark.

Finally, we also demonstrate that the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features and provide better interpretability.

Tabular data

To get a sense of the problem our method addresses, consider a table where the rows represent different samples and the columns represent both sample features (predictor variables) and the sample label (the target variable). TabTransformer takes the features of each sample as input and generates an output to best approximate the corresponding label.

In a practical industry setting, where the labels are partially available (i.e., semi-supervised learning scenarios), TabTransformer can be pre-trained on all the samples without any labels and fine-tuned on the labeled samples.

Additionally, companies usually have one large table (e.g., describing customers/products) that contains multiple target variables, and they are interested in analyzing this data in multiple ways. TabTransformer can be pre-trained on the large number of unlabeled samples once and fine-tuned multiple times for multiple target variables.

The architecture of TabTransformer is shown below. In our experiments, we use standard feature-engineering techniques to transform data types such as text, zip codes, and IP addresses into either numeric or categorical features.

Graphic shows the architecture of TabTransformer.
The architecture of TabTransformer.

Pretraining procedures

We explore two different types of pre-training procedures: masked language modeling (MLM) and replaced-token detection (RTD). In MLM, for each sample, we randomly select a certain portion of features to be masked and use the embeddings of the other features to reconstruct the masked features. In RTD, for each sample, instead of masking features, we replace them with random values chosen from the same columns.

In addition to comparing TabTransformer to baseline models, we conducted a study to demonstrate the interpretability of the embeddings produced by our contextual-embedding component.

In that study, we took contextual embeddings from different layers of the Transformer and computed a t-distributed stochastic neighbor embedding (t-SNE) to visualize their similarity in function space. More precisely, after training TabTransformer, we pass the categorical features in the test data through our trained model and extract all contextual embeddings (across all columns) from a certain layer of the Transformer. The t-SNE algorithm is then used to reduce each embedding to a 2-D point in the t-SNE plot.

T-SNE plots of learned embeddings for categorical features in the dataset BankMarketing. Left: The embeddings generated from the last layer of the Transformer. Center: The embeddings before being passed into the Transformer. Right: The embeddings learned by the model without the Transformer layers.
T-SNE plots of learned embeddings for categorical features in the dataset BankMarketing. Left: The embeddings generated from the last layer of the Transformer. Center: The embeddings before being passed into the Transformer. Right: The embeddings learned by the model without the Transformer layers.

The figure above shows the 2-D visualization of embeddings from the last layer of the Transformer for the dataset bank marketing. We can see that semantically similar classes are close to each other and form clusters (annotated by a set of labels) in the embedding space.

For example, all of the client-based features (colored markers), such as job, education level, and marital status, stay close to the center, and non-client-based features (gray markers), such as month (last contact month of the year) and day (last contact day of the week), lie outside the central area. In the bottom cluster, the embedding of having a housing loan stays close to that of having defaulted, while the embeddings of being a student, single marital status, not having a housing loan, and tertiary education level are close to each other.

Related content
Watch the keynote presentation by Alex Smola, AWS vice president and distinguished scientist, presented at the AutoML@ICML2020 workshop.

The center figure is the t-SNE plot of embeddings before being passed through the Transformer (i.e., from layer 0). The right figure is the t-SNE plot of the embeddings the model produces when the Transformer layers are removed, converting it into an ordinary multilayer perceptron (MLP). In those plots, we do not observe the types of patterns seen in the left plot.

Finally, we conduct extensive experiments on 15 publicly available datasets, using both supervised and semi-supervised learning. In the supervised-learning experiment, TabTransformer matched the performance of the state-of-the-art gradient-boosted decision-tree (GBDT) model and significantly outperformed the prior DNN models TabNet and Deep VIB.

Model name

Mean AUC (%)

TabTransformer

82.8 ± 0.4

MLP

81.8 ± 0.4

Gradient-boosted decision trees

82.9 ± 0.4

Sparse MLP

81.4 ± 0.4

Logistic regression

80.4 ± 0.4

TabNet

77.1 ± 0.5

Deep VIB

80.5 ± 0.4

Model performance with supervised learning. The evaluation metric is mean standard deviation of AUC score over the 15 datasets for each model. The larger the number, the better the result. The top two numbers are bold.

In the semi-supervised-learning experiment, we pretrain two TabTransformer models on the entire unlabeled set of training data, using the MLM and RTD methods respectively; then we fine-tune both models on labeled data.

As baselines, we use the semi-supervised learning methods pseudo labeling and entropy regularization to train both a TabTransformer network and an ordinary MLP. We also train a gradient-boosted-decision-tree model using pseudo-labeling and an MLP using a pretraining method called the swap-noise denoising autoencoder.

# Labeled data

50

200

500

TabTransformer-RTD

66.6 ± 0.6

70.9 ± 0.6

73.1 ± 0.6

TabTransformer-MLM

66.8 ± 0.6

71.0 ± 0.6

72.9 ± 0.6

ER-MLP

65.6 ± 0.6

69.0 ± 0.6

71.0 ± 0.6

PL-MLP

65.4 ± 0.6

68.8 ± 0.6

71.0 ± 0.6

ER-TabTransformer

62.7 ± 0.6

67.1 ± 0.6

69.3 ± 0.6

PL-TabTransformer

63.6 ± 0.6

67.3 ± 0.7

69.3 ± 0.6

DAE

65.2 ± 0.5

68.5 ± 0.6

71.0 ± 0.6

PL-GBDT

56.5 ± 0.5

63.1 ± 0.6

66.5 ± 0.7

Semi-supervised-learning results on six datasets, each with more than 30,000 unlabeled data points, and different number of labeled data points. Evaluation metric is mean AUC in percentage.

# Labeled data

50

200

500

TabTransformer-RTD

78.6 ± 0.6

81.6 ± 0.5

83.4 ± 0.5

TabTransformer-MLM

78.5 ± 0.6

81.0 ± 0.6

82.4 ± 0.5

ER-MLP

79.4 ± 0.6

81.1 ± 0.6

82.3 ± 0.6

PL-MLP

79.1 ± 0.6

81.1 ± 0.6

82.0 ± 0.6

ER-TabTransformer

77.9 ± 0.6

81.2 ± 0.6

82.1 ± 0.6

PL-TabTransformer

77.8 ± 0.6

81.0 ± 0.6

82.1 ± 0.6

DAE

78.5 ± 0.7

80.7 ± 0.6

82.2 ± 0.6

PL-GBDT

73.4 ± 0.7

78.8 ± 0.6

81.3 ± 0.6

Semi-supervised learning results on nine datasets, each with fewer than 30,000 data points, and different numbers of labeled data points. Evaluation metric is mean AUC in percentage.

To gauge relative performance with different amounts of unlabeled data, we split the set of 15 datasets into two subsets. The first set consists of the six datasets that containing more than 30,000 data points. The second set includes the remaining nine datasets.

When the amount of unlabeled data is large, TabTransformer-RTD and TabTransformer-MLM significantly outperform all the other competitors. Particularly, TabTransformer-RTD/MLM improvement are at least 1.2%, 2.0%, and 2.1% on mean AUC for the scenarios of 50, 200, and 500 labeled data points, respectively. When the number of unlabeled data becomes smaller, as shown in Table 3, TabTransformer-RTD still outperforms most of its competitors but with a marginal improvement.

Acknowledgments: Ashish Khetan, Milan Cvitkovic, Zohar Karnin

Related content

IN, KA, Bengaluru
Interested to build the next generation Financial systems that can handle billions of dollars in transactions? Interested to build highly scalable next generation systems that could utilize Amazon Cloud? Massive data volume + complex business rules in a highly distributed and service oriented architecture, a world class information collection and delivery challenge. Our challenge is to deliver the software systems which accurately capture, process, and report on the huge volume of financial transactions that are generated each day as millions of customers make purchases, as thousands of Vendors and Partners are paid, as inventory moves in and out of warehouses, as commissions are calculated, and as taxes are collected in hundreds of jurisdictions worldwide. Key job responsibilities • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. A day in the life • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. About the team The FinAuto TFAW(theft, fraud, abuse, waste) team is part of FGBS Org and focuses on building applications utilizing machine learning models to identify and prevent theft, fraud, abusive and wasteful(TFAW) financial transactions across Amazon. Our mission is to prevent every single TFAW transaction. As a Machine Learning Scientist in the team, you will be driving the TFAW Sciences roadmap, conduct research to develop state-of-the-art solutions through a combination of data mining, statistical and machine learning techniques, and coordinate with Engineering team to put these models into production. You will need to collaborate effectively with internal stakeholders, cross-functional teams to solve problems, create operational efficiencies, and deliver successfully against high organizational standards.
IN, KA, Bengaluru
Interested to build the next generation Financial systems that can handle billions of dollars in transactions? Interested to build highly scalable next generation systems that could utilize Amazon Cloud? Massive data volume + complex business rules in a highly distributed and service oriented architecture, a world class information collection and delivery challenge. Our challenge is to deliver the software systems which accurately capture, process, and report on the huge volume of financial transactions that are generated each day as millions of customers make purchases, as thousands of Vendors and Partners are paid, as inventory moves in and out of warehouses, as commissions are calculated, and as taxes are collected in hundreds of jurisdictions worldwide. Key job responsibilities • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. A day in the life • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. About the team The FinAuto TFAW(theft, fraud, abuse, waste) team is part of FGBS Org and focuses on building applications utilizing machine learning models to identify and prevent theft, fraud, abusive and wasteful(TFAW) financial transactions across Amazon. Our mission is to prevent every single TFAW transaction. As a Machine Learning Scientist in the team, you will be driving the TFAW Sciences roadmap, conduct research to develop state-of-the-art solutions through a combination of data mining, statistical and machine learning techniques, and coordinate with Engineering team to put these models into production. You will need to collaborate effectively with internal stakeholders, cross-functional teams to solve problems, create operational efficiencies, and deliver successfully against high organizational standards.
IN, KA, Bengaluru
Amazon Health Services (One Medical) About Us: At Health AI, we're revolutionizing healthcare delivery through innovative AI-enabled solutions. As part of Amazon Health Services and One Medical, we're on a mission to make quality healthcare more accessible while improving patient outcomes. Our work directly impacts millions of lives by empowering patients and enabling healthcare providers to deliver more meaningful care. Role Overview: We're seeking an Applied Scientist to join our dynamic team in building state of the art AI/ML solutions for healthcare. This role offers a unique opportunity to work at the intersection of artificial intelligence and healthcare, developing solutions that will shape the future of medical services delivery. Key job responsibilities • Lead end-to-end development of AI/ML solutions for Amazon Health organization, including Amazon Pharmacy and One Medical • Research, design, and implement state-of-the-art machine learning models, with a focus on Large Language Models (LLMs) and Visual Language Models (VLMs) • Optimize and fine-tune models for production deployment, including model distillation for improved latency • Drive scientific innovation while maintaining a strong focus on practical business outcomes • Collaborate with cross-functional teams to translate complex technical solutions into tangible customer benefits • Contribute to the broader Amazon Health scientific community and help shape our technical roadmap
US, CA, Pasadena
The Amazon Center for Quantum Computing in Pasadena, CA, is looking to hire an Applied Scientist specializing in Mixed-Signal Design. Working alongside other scientists and engineers, you will design and validate hardware performing the control and readout functions for AWS quantum processors. Candidates must have a solid background in mixed-signal design at the printed circuit board (PCB) level. Working effectively within a cross-functional team environment is critical. The ideal candidate will have demonstrated the capability to contribute to all phases of product life cycle development, from requirements gathering to verification. Diverse Experiences Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team Culture Here at Amazon, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship and Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Key job responsibilities Our scientists and engineers collaborate across diverse teams and projects to offer state of the art, cost effective solutions for the control of Amazon quantum processor systems. You’ll bring a passion for innovation, collaboration, and mentoring to: Solve layered technical problems, often ones not encountered before, across our hardware stack. Develop requirements with key system stakeholders, including quantum device, test and measurement, and cryogenic hardware teams. Design, implement, test, deploy, and maintain innovative solutions that meet both strict performance and cost metrics. Research enabling control system technologies necessary for Amazon to produce commercially viable quantum computers.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, CA, San Francisco
Amazon launched the AGI Lab to develop foundational capabilities for useful AI agents. We built Nova Act - a new AI model trained to perform actions within a web browser. The team builds AI/ML infrastructure that powers our production systems to run performantly at high scale. We’re also enabling practical AI to make our customers more productive, empowered, and fulfilled. In particular, our work combines large language models (LLMs) with reinforcement learning (RL) to solve reasoning, planning, and world modeling in both virtual and physical environments. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. We’d love for you to join our lab and build it from the ground up! Key job responsibilities This role will lead a team of SDEs building AI agents infrastructure from launch to scale. The role requires the ability to span across ML/AI system architecture and infrastructure. You will work closely with application developers and scientists to have a impact on the Agentic AI industry. We're looking for a Software Development Manager who is energized by building high performance systems, making an impact and thrives in fast-paced, collaborative environments. About the team Check out the Nova Act tools our team built on on nova.amazon.com/act
US, CA, Santa Clara
Amazon Quick Suite is an enterprise AI platform that transforms how organizations work with their data and knowledge. Combining generative AI-powered search, deep research capabilities, intelligent agents and automations, and comprehensive business intelligence, Quick Suite serves tens of thousands of users. Our platform processes thousands of queries monthly, helping teams make faster, data-driven decisions while maintaining enterprise-grade security and governance. From natural language interactions with complex datasets to automated workflows and custom AI agents, Quick Suite is redefining workplace productivity at unprecedented scale. We are seeking a Data Scientist II to join our Quick Data team, focusing on evaluation and benchmarking data development for Quick Suite features, with particular emphasis on Research and other generative AI capabilities. Our mission is to engineer high-quality datasets that are essential to the success of Amazon Quick Suite. From human evaluations and Responsible AI safeguards to Retrieval-Augmented Generation and beyond, our work ensures that Generative AI is enterprise-ready, safe, and effective for users at scale. As part of our diverse team—including data scientists, engineers, language engineers, linguists, and program managers—you will collaborate closely with science, engineering, and product teams. We are driven by customer obsession and a commitment to excellence. Key job responsibilities In this role, you will leverage data-centric AI principles to assess the impact of data on model performance and the broader machine learning pipeline. You will apply Generative AI techniques to evaluate how well our data represents human language and conduct experiments to measure downstream interactions. Specific responsibilities include: * Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features * Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings * Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases * Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance * Develop and refine annotation guidelines and quality frameworks for evaluation tasks * Conduct statistical analysis to measure model performance, identify failure patterns, and guide improvement strategies * Collaborate with ML scientists and engineers to translate evaluation insights into actionable product improvements * Build scalable data pipelines and tools to support continuous evaluation and benchmarking efforts * Contribute to Responsible AI initiatives by developing safety and fairness evaluation datasets About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.