Alexa speech science developments at Interspeech 2022

Research from Alexa Speech covers a range of topics related to end-to-end neural speech recognition and fairness.

Interspeech, the world’s largest and most comprehensive conference on the science and technology of spoken-language processing, took place this week in Incheon, Korea, with Amazon as a platinum sponsor. Amazon Science asked three of Alexa AI’s leading scientists — in the fields of speech, spoken-language-understanding, and text-to-speech — to highlight some of Amazon’s contributions to the conference.

Related content
Methods for learning from noisy data, using phonetic embeddings to improve entity resolution, and quantization-aware training are a few of the highlights.

In this installment, senior principal scientist Andreas Stolcke selects papers from Alexa AI’s speech science organization, focusing on two overarching themes in recent research on speech-enabled AI: end-to-end neural speech recognition and fairness.

End-to-end neural speech recognition

Traditionally, speech recognition systems have included components specialized for different aspects of linguistic knowledge: acoustic models to capture the correspondence between speech sounds and acoustic waveforms (phonetics), pronunciation models to map those sounds to words, and language models (LMs) to capture higher-order properties such as syntax, semantics, and dialogue context.

All these models are trained on separate data and combined using graph and search algorithms, to infer the most probable sequence of words corresponding to acoustic input. The latest versions of these systems employ neural networks for individual components, typically in the acoustic and language models, while still relying on non-neural methods for model integration; they are therefore known as “hybrid” automatic-speech-recognition (ASR) systems.

While the hybrid ASR approach is structured and modular, it also makes it hard to model the ways in which acoustic, phonetic, and word-level representations interact and to optimize the recognition system end to end. For these reasons, much recent research in ASR has focused on so-called end-to-end or all-neural recognition systems, which infer a sequence of words directly from acoustic inputs.

Related content
Innovative training methods and model compression techniques combine with clever engineering to keep speech processing local.

End-to-end ASR systems use deep multilayered neural architectures that can be optimized end to end for recognition accuracy. While they do require large amounts of data and computation for training, once trained, they offer a simplified computational architecture for inference, as well as superior performance.

Alexa’s ASR employs end-to-end as its core algorithm, both in the cloud and on-device. Across the industry and in academic research, end-to-end architectures are still being improved to achieve better accuracy, to require less computation and/or latency, or to mitigate the lack of modularity that makes it challenging to inject external (e.g., domain-specific) knowledge at run time.

Alexa AI papers at Interspeech address several open problems in end-to-end ASR, and we summarize a few of those papers here.

In “ConvRNN-T: Convolutional augmented recurrent neural network transducers for streaming speech recognition”, Martin Radfar and coauthors propose a new variant of the popular recurrent-neural-network-transducer (RNN-T) end-to-neural architecture. One of their goals is to preserve the property of causal processing, meaning that the model output depends only on past and current (but not future) inputs, which enables streaming ASR. At the same time, they want to improve the model’s ability to capture long-term contextual information.

ConvRNN.png
A high-level block diagram of ConvRNN-T.

To achieve both goals, they augment the vanilla RNN-T with two distinct convolutional (CNN) front ends: a standard one for encoding correlations localized in time and a novel “global CNN” encoder that is designed to capture long-term correlations by summarizing activations over the entire utterance up to the current time step (while processing utterances incrementally through time).

The authors show that the resulting ConvRNN-T gives superior accuracy compared to other proposed neural streaming ASR architectures, such as the basic RNN-T, Conformer, and ContextNet.

Another concern with end-to-end ASR models is computational efficiency, especially since the unified neural architecture makes these models very attractive for on-device deployment, where compute cycles and (for mobile devices) power are at a premium.

In their paper “Compute cost amortized Transformer for streaming ASR”, Yi Xie and colleagues exploit the intuitive observation that the amount of computation a model performs should vary as a function of the difficulty of the task; for instance, input in which noise or an accent causes ambiguity may require more computation than a clean input with a mainstream accent. (We may think of this as the ASR model “thinking harder” in places where the words are more difficult to discern.)

Related content
A new approach to determining the “channel configuration” of convolutional neural nets improves accuracy while maintaining runtime efficiency.

The researchers achieve this with a very elegant method that leverages the integrated neural structure of the model. Their starting point is a Transformer-based ASR system, consisting of multiple stacked layers of multiheaded self-attention (MHA) and feed-forward neural blocks. In addition, they train “arbitrator” networks that look at the acoustic input (and, optionally, also at intermediate block outputs) to toggle individual components on or off.

Because these component blocks have “skip connections” that combine their outputs with the outputs of earlier layers, they are effectively optional for the overall computation to proceed. A block that is toggled off for a given input frame saves all the computation normally carried out by that block, producing a zero vector output. The following diagram shows the structure of both the elementary Transformer building block and the arbitrator that controls it:

Arbitrator:Transformer backbone.png
Illustration of the arbitrator and Transformer backbone of each block. The lightweight arbitrator toggles whether to evaluate subcomponents during the forward pass.

The arbitrator networks themselves are small enough that they do not contribute significant additional computation. What makes this scheme workable and effective, however, is that both the Transformer assemblies and the arbitrators that control them can be trained jointly, with dual goals: to perform accurate ASR and to minimize the overall amount of computation. The latter is achieved by adding a term to the training objective function that rewards reducing computation. Dialing a hyperparameter up or down selects the desired balance between accuracy and computation.

Related content
Branching encoder networks make operation more efficient, while “neural diffing” reduces bandwidth requirements for model updates.

The authors show that their method can achieve a 60% reduction in computation with only a minor (3%) increase in ASR error. Their cost-amortized Transformer proves much more effective than a benchmark method that constrains the model to attend only to sliding windows over the input, which yields only 13% savings and an error increase of almost three times as much.

Finally, in this short review of end-to-end neural ASR advances, we look at ways to recognize speech from more than one speaker, while keeping track of who said what (also known as speaker-attributed ASR).

This has traditionally been done with modular systems that perform ASR and, separately, perform speaker diarization, i.e., labeling stretches of audio according to who is speaking. However, here, too, neural models have recently brought advances and simplification, by integrating these two tasks in a single end-to-end neural model.

In their paper “Separator-transducer-segmenter: Streaming recognition and segmentation of multi-party speech”, Ilya Sklyar and colleagues not only integrate ASR and segmentation-by-speaker but do so while processing inputs incrementally. Streaming multispeaker ASR with low latency is a key technology to enable voice assistants to interact with customers in collaborative settings. Sklyar’s system does this with a generalization of the RNN-T architecture that keeps track of turn-taking between multiple speakers, up to two of whom can be active simultaneously. The researchers’ separator-transducer-segmenter model is depicted below:

Separator-transducer-segmenter.png
Separator-transducer-segmenter. The tokens <sot> and <eot> represent the start of turn and end of turn. Model blocks with the same color have tied parameters, and transcripts in the color-matched boxes belong to the same speaker.

A key element that yields improvements over an earlier approach is the use of dedicated tokens to recognize both starts and ends of speaker turns, for what the authors call “start-pointing” and “end-pointing”. (End-pointing is a standard feature of many interactive ASR systems necessary to predict when a talker is done.) Beyond representing the turn-taking structure in this symbolic way, the model is also penalized during training for taking too long to output these markers, in order to improve the latency and temporal accuracy of the outputs.

Fairness in the performance of speech-enabled AI

The second theme we’d like to highlight, and one that is receiving increasing attention in speech and other areas of AI, is performance fairness: the desire to avert large differences in accuracy across different cohorts of users or on content associated with protected groups. As an example, concerns about this type of fairness gained prominence with demonstrations that certain computer vision algorithms performed poorly for certain skin tones, in part due to underrepresentation in the training data.

Related content
The team’s latest research on privacy-preserving machine learning, federated learning, and bias mitigation.

There’s a similar concern about speech-based AI, with speech properties varying widely as a function of speaker background and environment. A balanced representation in training sets is hard to achieve, since the speakers using commercial products are largely self-selected, and speaker attributes are often unavailable for many reasons, privacy among them. This topic is also the subject of a special session at Interspeech, Inclusive and Fair Speech Technologies, which several Alexa AI scientists are involved in as co-organizers and presenters.

One of the special-session papers, “Reducing geographic disparities in automatic speech recognition via elastic weight consolidation”, by Viet Anh Trinh and colleagues, looks at how geographic location within the U.S. affects ASR accuracy and how models can be adapted to narrow the gap for the worst-performing regions. Here and elsewhere, a two-step approach is used: first, subsets of speakers with higher-than-average error rates are identified; then a mitigation step attempts to improve performance for those cohorts. Trinh et al.’s method identifies the cohorts by partitioning the speakers according to their geographic longitude and latitude, using a decision-tree-like algorithm that maximizes the word-error-rate (WER) differences between resulting regions:

Reducing geographical disparities.png
A map of 126 regions identified by the clustering tree. The color does not indicate a specific word error rate (WER), but regions with the same color do have the same WER.

Next, the regions are ranked by their average WERs; data from the highest-error regions is identified for performance improvement. To achieve that, the researchers use fine-tuning to optimize the model parameters for the targeted regions, while also employing a technique called elastic weight consolidation (EWC) to minimize performance degradation on the remaining regions.

This is important to prevent a phenomenon known as “catastrophic forgetting”, in which neural models degrade substantially on prior training data during fine-tuning. The idea is to quantify the influence that different dimensions of the parameter space have on the overall performance and then avoid large variations along those dimensions when adapting to a data subset. This approach decreases the WER mean, maximum, and variance across regions and even the overall WER (including the regions not fine-tuned on), beating out several baseline methods for model adaptation.

Pranav Dheram et al., in their paper “Toward fairness in speech recognition: Discovery and mitigation of performance disparities”, look at alternative methods for identifying underperforming speaker cohorts. One approach is to use human-defined geographic regions as given by postal (a.k.a. zip) codes, in combination with demographic information from U.S. census data, to partition U.S. geography.

Related content
NSF deputy assistant director Erwin Gianchandani on the challenges addressed by funded projects.

Zip codes are sorted into binary partitions by majority demographic attributes, so as to maximize WER discrepancies. The partition with higher WER is then targeted for mitigations, an approach similar to that adopted in the Trinh et al. paper. However, this approach is imprecise (since it lumps together speakers by zip code) and limited to available demographic data, so it generalizes poorly to other geographies.

Alternatively, Dheram et al. use speech characteristics learned by a neural speaker identification model to group speakers. These “speaker embedding vectors” are clustered, reflecting the intuition that speakers who sound similar will tend to have similar ASR difficulty.

Subsequently, these virtual speaker regions (not individual identities) can be ranked by difficulty and targeted for mitigation, without relying on human labeling, grouping, or self-identification of speakers or attributes. As shown in the table below, the automatic approach identifies a larger gap in ASR accuracy than the “geo-demographic” approach, while at the same time targeting a larger share of speakers for performance mitigation:

Cohort discovery

WER gap (%)

Bottom-cohort share (%)

Geodemographic

Automatic

41.7

65.0

0.8

10.0

The final fairness-themed paper we highlight explores yet another approach to avoiding performance disparities, known as adversarial reweighting (ARW). Instead of relying on explicit partitioning of the input space, this approach assigns continuous weights to the training instances (as a function of input features), with the idea that harder examples get higher weights and thereby exert more influence on the performance optimization.

Related content
Method significantly reduces bias while maintaining comparable performance on machine learning tasks.

Secondly, ARW more tightly interleaves, and iterates, the (now weighted) cohort identification and mitigation steps. Mathematically, this is formalized as a min-max optimization algorithm that alternates between maximizing the error by changing the sample weights (hence “adversarial”) and minimizing the weighted verification error by adjusting the target model parameters.

ARW was designed for group fairness in classification and regression tasks that take individual data points as inputs. “Adversarial reweighting for speaker verification fairness”, by Minho Jin et al., looks at how the concept can be applied to a classification task that depends on pairs of input samples, i.e., checking whether two speech samples come from the same speaker. Solving this problem could help make a voice-based assistant more reliable at personalization and other functions that require knowing who is speaking.

The authors look at several ways to adapt ARW to learning similarity among speaker embeddings. The method that ultimately worked best assigns each pair of input samples an adversarial weight that is the sum of individual sample weights (thereby reducing the dimensionality of the weight prediction). The individual sample weights are also informed by which region of the speaker embedding space a sample falls into (as determined by unsupervised k-means clustering, the same technique used in Dheram et al.’s automatic cohort-identification method).

Computing ARW weights.png
Computing adversarial-reweighting (ARW) weights.

I omit the details, but once the pairwise (PW) adversarial weights are formalized in this way, we can insert them into the loss function for metric learning, which is the basis of training a speaker verification model. Min-max optimization can then take turns training the adversary network that predicts the weights and optimizing the speaker embedding extractor that learns speaker similarity.

On a public speaker verification corpus, the resulting system reduced overall equal-error rate by 7.6%, while also reducing the gap between genders by 17%. It also reduced the error variability across different countries of origin, by nearly 10%. Note that, as in the case of the Trinh et al. ASR fairness paper, fairness mitigation improves both performance disparities and overall accuracy.

This concludes our thematic highlights of Alexa Speech Interspeech papers. Note that Interspeech covers much more than speech and speaker recognition. Please check out companion pieces that feature additional work, drawn from technical areas that are no less essential for a functioning speech-enabled AI assistant: natural-language understanding and speech synthesis.

Research areas

Related content

CN, 44, Shenzhen
职位:Applied scientist 应用科学家实习生 毕业时间:2026年10月 - 2027年7月之间毕业的应届毕业生 · 入职日期:2026年6月及之前 · 实习时间:保证一周实习4-5天全职实习,至少持续3个月 · 工作地点:深圳福田区 投递须知: 1 填写简历申请时,请把必填和非必填项都填写完整。提交简历之后就无法修改了哦! 2 学校的英文全称请准确填写。中英文对应表请查这里(无法浏览请登录后浏览)https://docs.qq.com/sheet/DVmdaa1BCV0RBbnlR?tab=BB08J2 关于职位 Amazon Device &Services Asia团队正在寻找一位充满好奇心、善于沟通的应用科学家实习生,成为连接前沿AI研究与现实世界认知的桥梁。这是一个独特的角色——既需要动手参与机器学习项目,又要接受将复杂AI概念转化为通俗易懂内容的创意挑战。D&S Asia是亚马逊设备与服务业务在亚洲的支柱组织,自2009年支持Kindle制造起步,现已发展为横跨软硬件、AI(Alexa)及智能家居(Ring/Blink)的综合性团队,持续驱动区域业务创新与人才发展。 你将做什么 • 解密AI: 将复杂的技术发现转化为直观的解释、博客文章、教程或互动演示,让非技术背景的业务方和更广泛的社区都能理解 • 技术叙事: 与工程团队协作,以清晰、引人入胜的方式记录AI的能力与局限性 • 知识共享: 协助开发内部工作坊或"AI入门"课程,提升跨职能团队(产品、设计、商务)的AI素养 • 保持前沿: 持续学习并整合最新突破(如大语言模型、扩散模型、智能体),为团队输出简明易懂的趋势简报 • 研究与应用: 参与端到端的应用研究项目,从文献综述到原型开发,涵盖自然语言处理、计算机视觉或多模态AI领域
US, MA, N.reading
Amazon Industrial Robotics Group is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon Industrial Robotics Group, we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of dexterous manipulation system that: - Enables unprecedented generalization across diverse tasks - Enables contact-rich manipulation in different environments - Seamlessly integrates low-level skills and high-level behaviors - Leverage mechanical intelligence, multi-modal sensor feedback and advanced control techniques. The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. A day in the life - Lead design and implementation of methods for Visual SLAM, navigation and spatial reasoning - Leverage simulation and real-world data collection to create large datasets for model development - Develop a hierarchical system that combines low-level control with high-level planning - Collaborate effectively with multi-disciplinary teams to co-design hardware and algorithms for dexterous manipulation
US, WA, Bellevue
Amazon LEO is Amazon's low Earth orbit satellite network. Our mission is to deliver fast, reliable internet connectivity to customers beyond the reach of existing networks. From individual households to schools, hospitals, businesses, and government agencies, Amazon LEO will serve people and organizations operating in locations without reliable connectivity. The Amazon LEO Global Business Operations (GBO) team drives data-driven decision-making across sales, marketing, operations, product, engineering, finance, and legal functions. We build scalable business intelligence solutions and data infrastructure to solve complex, ambiguous problems with LEO-wide impact. We are looking for a talented Research Scientist to contribute to LEO's long-term vision and strategy for capacity simulations and inventory optimization. This effort will be instrumental in helping LEO execute on its business plans globally. As one of our valued team members, you will be obsessed with matching our standards for operational excellence with a relentless focus on delivering results. Key job responsibilities In this role, you will: Collaborate with product, business development, sales, marketing, operations, finance, and various technical teams (engineering, science, R&D, simulations, etc.) to support the implementation of capacity simulations and inventory optimization solutions. Develop and prototype scalable solutions to optimization problems for operating and planning satellite resources. Support technical roadmap definition efforts by building models to predict future inventory availability and key operational and financial metrics across the network. Design experiments and simulations to evaluate optimization improvements and understand how they interact with each other. Analyze large amounts of satellite and business data to identify simulation and optimization opportunities. Communicate insights and recommendations to technical and non-technical audiences to support decision-making across LEO. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum.
US, WA, Seattle
Amazon Prime is looking for an ambitious Economist Intern to help create econometric insights for world-wide Prime. Prime is Amazon's premiere membership program, with over 200M members world-wide. This role is at the center of many major company decisions that impact Amazon's customers. These decisions span a variety of industries, each reflecting the diversity of Prime benefits. These range from fast-free e-commerce shipping, digital content (e.g., exclusive streaming video, music, gaming, photos), reading, healthcare, and grocery offerings. Prime Science creates insights that power these decisions. As an economist intern in this role, you will create statistical tools that embed causal interpretations. You will utilize massive data, state-of-the-art scientific computing, econometrics (causal, counterfactual/structural, experimentation), and machine-learning, to do so. Some of the science you create will be publishable in internal or external scientific journals and conferences. You will work closely with a team of economists, applied scientists, data professionals (business analysts, business intelligence engineers), product managers, and software/data engineers. You will create insights from descriptive statistics, as well as from novel statistical and econometric models. You will create internal-to-Amazon-facing automated scientific data products to power company decisions. You will write strategic documents explaining how senior company leaders should utilize these insights to create sustainable value for customers. These leaders will often include the senior-most leaders at Amazon. The team is unique in its exposure to company-wide strategies as well as senior leadership. It operates at the research frontier of utilizing data, econometrics, artificial intelligence, and machine-learning to form business strategies. A successful candidate will have demonstrated a capacity for building, estimating, and defending statistical models (e.g., causal, counterfactual, machine-learning) using software such as R, Python, or STATA. They will have a willingness to learn and apply a broad set of statistical and computational techniques to supplement deep training in one area of econometrics. For example, many applications on the team motivate the use of structural econometrics and machine-learning. They rely on building scalable production software, which involves a broad set of world-class software-building skills often learned on-the-job. As a consequence, already-obtained knowledge of SQL, machine learning, and large-scale scientific computing using distributed computing infrastructures such as Spark-Scala or PySpark would be a plus. Additionally, this candidate will show a track-record of delivering projects well and on-time, preferably in collaboration with other team members (e.g. co-authors). Candidates must have very strong writing and emotional intelligence skills (for collaborative teamwork, often with colleagues in different functional roles), a growth mindset, and a capacity for dealing with a high-level of ambiguity. Endowed with these traits and on-the-job-growth, the role will provide the opportunity to have a large strategic, world-wide impact on the customer experiences of Prime members.
US, WA, Bellevue
The Mission Build AI safety systems that protect millions of Alexa customers every day. As conversational AI evolves, you'll solve challenging problems in Responsible AI by ensuring LLMs provide safe, trustworthy responses, building AI systems that understand nuanced human values across cultures, and maintaining customer trust at scale. What You'll Build You'll pioneer breakthrough solutions in Responsible AI at Amazon's scale. Imagine training models that set new safety standards, designing automated testing systems that hunt for vulnerabilities before they surface, and certifying the systems that power millions of daily conversations. You'll create intelligent evaluation systems that judge responses with human-level insight, build models that truly understand what makes interactions safe and delightful, and craft feedback mechanisms that help Alexa+ grasp the nuances of complex customer conversations. Here's where it gets even more exciting: you'll build AI agents that act as your team's safety net—automatically detecting and fixing production issues in real-time, often before anyone notices there was a problem. Your innovations won't just improve Alexa+; they'll fundamentally shape how it learns, evolves, and earns customer trust. As Alexa+ continues to delight customers, your work ensures it becomes more trustworthy, safer, and deeply aligned with customer needs and expectations. Your work directly protects customer trust at Amazon's scale. Every innovation you create—from novel safety mechanisms to sophisticated evaluation techniques—shapes how millions of people interact with AI confidently. You're not just building products; you're defining industry standards for responsible AI. This is frontier research with immediate real-world impact. You'll tackle problems that require innovative solutions: training models that remain truthful and grounded across diverse contexts, building reward models that capture the nuanced spectrum of human values across cultures and languages, and creating automated systems that continuously discover and address potential issues before customers encounter them. You'll collaborate with world-class scientists, product managers, and engineers to transform state-of-the-art ideas into production systems serving millions. What We're Looking For * Deep expertise in state-of-the-art NLP and Large Language Models * Track record of building scalable ML systems * Passion for impactful research—where frontier science meets real-world responsibility at scale * Excitement about solving problems that will shape the future of AI Ready to work on AI safety challenges that define the industry? Join us. Key job responsibilities This is where you'll make your mark. You'll architect breakthrough Responsible AI solutions that become industry benchmarks, pioneering algorithms that eliminate false information, designing frameworks that hunt down vulnerabilities before bad actors find them, and developing models that understand human values across every culture we serve. Working with world-class engineers and scientists, you'll push the boundaries of model training—transforming bold research into production systems that protect millions of customers daily while withstanding attacks and delivering exceptional experiences. But here's what makes this role truly special: you'll shape the future. You'll lead certification processes, advance optimization techniques, build evaluation systems that reason like humans, and mentor the next generation of AI safety experts. Every innovation you drive will set new standards for trustworthy AI at the world's largest scale. A day in the life As a Responsible AI Scientist, you're at the frontier of AI safety—experimenting with breakthrough techniques that push the boundaries of what's possible. You partner with engineering to transform research into production-ready solutions, tackling complex optimization challenges. You brainstorm with Product teams, translating ambitious visions into concrete objectives that drive real impact. Your expertise shapes critical deployment decisions as you review impactful work and guide go/no-go calls. You mentor the next generation of AI safety leaders, watching ideas spark and capabilities grow. This is where science meets impact—building AI that's not just intelligent, but trustworthy and aligned with human values. About the team Our team pioneers Responsible AI for conversational assistants. We ensure Alexa delivers safe, trustworthy experiences across all devices, modalities, and languages worldwide. We work on frontier AI safety challenges—and we're looking for scientists who want to help shape the future of trustworthy AI.
US, WA, Bellevue
The Mission Build AI safety systems that protect millions of Alexa customers every day. As conversational AI evolves, you'll solve challenging problems in Responsible AI by ensuring LLMs provide safe, trustworthy responses, building AI systems that understand nuanced human values across cultures, and maintaining customer trust at scale. What You'll Build You'll pioneer breakthrough solutions in Responsible AI at Amazon's scale. Imagine training models that set new safety standards, designing automated testing systems that hunt for vulnerabilities before they surface, and certifying the systems that power millions of daily conversations. You'll create intelligent evaluation systems that judge responses with human-level insight, build models that truly understand what makes interactions safe and delightful, and craft feedback mechanisms that help Alexa+ grasp the nuances of complex customer conversations. Here's where it gets even more exciting: you'll build AI agents that act as your team's safety net—automatically detecting and fixing production issues in real-time, often before anyone notices there was a problem. Your innovations won't just improve Alexa+; they'll fundamentally shape how it learns, evolves, and earns customer trust. As Alexa+ continues to delight customers, your work ensures it becomes more trustworthy, safer, and deeply aligned with customer needs and expectations. Your work directly protects customer trust at Amazon's scale. Every innovation you create—from novel safety mechanisms to sophisticated evaluation techniques—shapes how millions of people interact with AI confidently. You're not just building products; you're defining industry standards for responsible AI. This is frontier research with immediate real-world impact. You'll tackle problems that require innovative solutions: training models that remain truthful and grounded across diverse contexts, building reward models that capture the nuanced spectrum of human values across cultures and languages, and creating automated systems that continuously discover and address potential issues before customers encounter them. You'll collaborate with world-class scientists, product managers, and engineers to transform state-of-the-art ideas into production systems serving millions. What We're Looking For * Deep expertise in state-of-the-art NLP and Large Language Models * Track record of building scalable ML systems * Passion for impactful research—where frontier science meets real-world responsibility at scale * Excitement about solving problems that will shape the future of AI Ready to work on AI safety challenges that define the industry? Join us. Key job responsibilities This is where you'll make your mark. You'll architect breakthrough Responsible AI solutions that become industry benchmarks, pioneering algorithms that eliminate false information, designing frameworks that hunt down vulnerabilities before bad actors find them, and developing models that understand human values across every culture we serve. Working with world-class engineers and scientists, you'll push the boundaries of model training—transforming bold research into production systems that protect millions of customers daily while withstanding attacks and delivering exceptional experiences. But here's what makes this role truly special: you'll shape the future. You'll lead certification processes, advance optimization techniques, build evaluation systems that reason like humans, and mentor the next generation of AI safety experts. Every innovation you drive will set new standards for trustworthy AI at the world's largest scale. A day in the life As a Responsible AI Scientist, you're at the frontier of AI safety—experimenting with breakthrough techniques that push the boundaries of what's possible. You partner with engineering to transform research into production-ready solutions, tackling complex optimization challenges. You brainstorm with Product teams, translating ambitious visions into concrete objectives that drive real impact. Your expertise shapes critical deployment decisions as you review impactful work and guide go/no-go calls. You mentor the next generation of AI safety leaders, watching ideas spark and capabilities grow. This is where science meets impact—building AI that's not just intelligent, but trustworthy and aligned with human values. About the team Our team pioneers Responsible AI for conversational assistants. We ensure Alexa delivers safe, trustworthy experiences across all devices, modalities, and languages worldwide. We work on frontier AI safety challenges—and we're looking for scientists who want to help shape the future of trustworthy AI.
GB, London
We are looking for an Economist to work on exciting and challenging business problems related to Amazon Retail’s worldwide product assortment. You will build innovative solutions based on econometrics, machine learning, and experimentation. You will be part of a interdisciplinary team of economists, product managers, engineers, and scientists, and your work will influence finance and business decisions affecting Amazon’s vast product assortment globally. If you have an entrepreneurial spirit, you know how to deliver results fast, and you have a deeply quantitative, highly innovative approach to solving problems, and long for the opportunity to build pioneering solutions to challenging problems, we want to talk to you. Key job responsibilities * Work on a challenging problem that has the potential to significantly impact Amazon’s business position * Develop econometric models and experiments to measure the customer and financial impact of Amazon’s product assortment * Collaborate with other scientists at Amazon to deliver measurable progress and change * Influence business leaders based on empirical findings
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the limits. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As an Applied Scientist on our team, you will focus on building state-of-the-art ML models for biology. Our team rewards curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the forefront of both academic and applied research in this product area, you have the opportunity to work together with a diverse and talented team of scientists, engineers, and product managers and collaborate with other teams. Key job responsibilities - Build, adapt and evaluate ML models for life sciences applications - Collaborate with a cross-functional team of ML scientists, biologists, software engineers and product managers
US, WA, Seattle
Amazon Prime is looking for an ambitious Economist Intern to help create econometric insights for world-wide Prime. Prime is Amazon's premiere membership program, with over 200M members world-wide. This role is at the center of many major company decisions that impact Amazon's customers. These decisions span a variety of industries, each reflecting the diversity of Prime benefits. These range from fast-free e-commerce shipping, digital content (e.g., exclusive streaming video, music, gaming, photos), reading, healthcare, and grocery offerings. Prime Science creates insights that power these decisions. As an economist intern in this role, you will create statistical tools that embed causal interpretations. You will utilize massive data, state-of-the-art scientific computing, econometrics (causal, counterfactual/structural, experimentation), and machine-learning, to do so. Some of the science you create will be publishable in internal or external scientific journals and conferences. You will work closely with a team of economists, applied scientists, data professionals (business analysts, business intelligence engineers), product managers, and software/data engineers. You will create insights from descriptive statistics, as well as from novel statistical and econometric models. You will create internal-to-Amazon-facing automated scientific data products to power company decisions. You will write strategic documents explaining how senior company leaders should utilize these insights to create sustainable value for customers. These leaders will often include the senior-most leaders at Amazon. The team is unique in its exposure to company-wide strategies as well as senior leadership. It operates at the research frontier of utilizing data, econometrics, artificial intelligence, and machine-learning to form business strategies. A successful candidate will have demonstrated a capacity for building, estimating, and defending statistical models (e.g., causal, counterfactual, machine-learning) using software such as R, Python, or STATA. They will have a willingness to learn and apply a broad set of statistical and computational techniques to supplement deep training in one area of econometrics. For example, many applications on the team motivate the use of structural econometrics and machine-learning. They rely on building scalable production software, which involves a broad set of world-class software-building skills often learned on-the-job. As a consequence, already-obtained knowledge of SQL, machine learning, and large-scale scientific computing using distributed computing infrastructures such as Spark-Scala or PySpark would be a plus. Additionally, this candidate will show a track-record of delivering projects well and on-time, preferably in collaboration with other team members (e.g. co-authors). Candidates must have very strong writing and emotional intelligence skills (for collaborative teamwork, often with colleagues in different functional roles), a growth mindset, and a capacity for dealing with a high-level of ambiguity. Endowed with these traits and on-the-job-growth, the role will provide the opportunity to have a large strategic, world-wide impact on the customer experiences of Prime members.
US, WA, Seattle
The Sponsored Products and Brands (SPB) team at Amazon Ads is re-imagining the advertising landscape through state-of-the-art generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Curious about our advertising solutions? Discover more about Sponsored Products and Sponsored Brands to see how we’re helping businesses grow on Amazon.com and beyond! Key job responsibilities This role will redesign how ads create personalized, relevant shopping experiences with customer value at the forefront. Key responsibilities include: - Design and develop solutions using GenAI, deep learning, multi-objective optimization and/or reinforcement learning to transform ad retrieval, auctions, whole-page relevance, and shopping experiences. - Partner with scientists, engineers, and product managers to build scalable, production-ready science solutions. - Apply industry advances in GenAI, Large Language Models (LLMs), and related fields to create innovative prototypes and concepts. - Improve the team's scientific and technical capabilities by implementing algorithms, methodologies, and infrastructure that enable rapid experimentation and scaling. - Mentor junior scientists and engineers to build a high-performing, collaborative team. A day in the life As an Applied Scientist on the Sponsored Products and Brands Off-Search team, you will contribute to the development in Generative AI (GenAI) and Large Language Models (LLMs) to revolutionize our advertising flow, backend optimization, and frontend shopping experiences. This is a rare opportunity to redefine how ads are retrieved, allocated, and/or experienced—elevating them into personalized, contextually aware, and inspiring components of the customer journey. You will have the opportunity to fundamentally transform areas such as ad retrieval, ad allocation, whole-page relevance, and differentiated recommendations through the lens of GenAI. By building novel generative models grounded in both Amazon’s rich data and the world’s collective knowledge, your work will shape how customers engage with ads, discover products, and make purchasing decisions. If you are passionate about applying frontier AI to real-world problems with massive scale and impact, this is your opportunity to define the next chapter of advertising science. About the team The Off-Search team within Sponsored Products and Brands (SPB) is focused on building delightful ad experiences across various surfaces beyond Search on Amazon—such as product detail pages, the homepage, and store-in-store pages—to drive monetization. Our vision is to deliver highly personalized, context-aware advertising that adapts to individual shopper preferences, scales across diverse page types, remains relevant to seasonal and event-driven moments, and integrates seamlessly with organic recommendations such as new arrivals, basket-building content, and fast-delivery options. To execute this vision, we work in close partnership with Amazon Stores stakeholders to lead the expansion and growth of advertising across Amazon-owned and -operated pages beyond Search. We operate full stack—from backend ads-retail edge services, ads retrieval, and ad auctions to shopper-facing experiences—all designed to deliver meaningful value.