Alexa speech science developments at Interspeech 2022

Research from Alexa Speech covers a range of topics related to end-to-end neural speech recognition and fairness.

Interspeech, the world’s largest and most comprehensive conference on the science and technology of spoken-language processing, took place this week in Incheon, Korea, with Amazon as a platinum sponsor. Amazon Science asked three of Alexa AI’s leading scientists — in the fields of speech, spoken-language-understanding, and text-to-speech — to highlight some of Amazon’s contributions to the conference.

Related content
Methods for learning from noisy data, using phonetic embeddings to improve entity resolution, and quantization-aware training are a few of the highlights.

In this installment, senior principal scientist Andreas Stolcke selects papers from Alexa AI’s speech science organization, focusing on two overarching themes in recent research on speech-enabled AI: end-to-end neural speech recognition and fairness.

End-to-end neural speech recognition

Traditionally, speech recognition systems have included components specialized for different aspects of linguistic knowledge: acoustic models to capture the correspondence between speech sounds and acoustic waveforms (phonetics), pronunciation models to map those sounds to words, and language models (LMs) to capture higher-order properties such as syntax, semantics, and dialogue context.

All these models are trained on separate data and combined using graph and search algorithms, to infer the most probable sequence of words corresponding to acoustic input. The latest versions of these systems employ neural networks for individual components, typically in the acoustic and language models, while still relying on non-neural methods for model integration; they are therefore known as “hybrid” automatic-speech-recognition (ASR) systems.

While the hybrid ASR approach is structured and modular, it also makes it hard to model the ways in which acoustic, phonetic, and word-level representations interact and to optimize the recognition system end to end. For these reasons, much recent research in ASR has focused on so-called end-to-end or all-neural recognition systems, which infer a sequence of words directly from acoustic inputs.

Related content
Innovative training methods and model compression techniques combine with clever engineering to keep speech processing local.

End-to-end ASR systems use deep multilayered neural architectures that can be optimized end to end for recognition accuracy. While they do require large amounts of data and computation for training, once trained, they offer a simplified computational architecture for inference, as well as superior performance.

Alexa’s ASR employs end-to-end as its core algorithm, both in the cloud and on-device. Across the industry and in academic research, end-to-end architectures are still being improved to achieve better accuracy, to require less computation and/or latency, or to mitigate the lack of modularity that makes it challenging to inject external (e.g., domain-specific) knowledge at run time.

Alexa AI papers at Interspeech address several open problems in end-to-end ASR, and we summarize a few of those papers here.

In “ConvRNN-T: Convolutional augmented recurrent neural network transducers for streaming speech recognition”, Martin Radfar and coauthors propose a new variant of the popular recurrent-neural-network-transducer (RNN-T) end-to-neural architecture. One of their goals is to preserve the property of causal processing, meaning that the model output depends only on past and current (but not future) inputs, which enables streaming ASR. At the same time, they want to improve the model’s ability to capture long-term contextual information.

ConvRNN.png
A high-level block diagram of ConvRNN-T.

To achieve both goals, they augment the vanilla RNN-T with two distinct convolutional (CNN) front ends: a standard one for encoding correlations localized in time and a novel “global CNN” encoder that is designed to capture long-term correlations by summarizing activations over the entire utterance up to the current time step (while processing utterances incrementally through time).

The authors show that the resulting ConvRNN-T gives superior accuracy compared to other proposed neural streaming ASR architectures, such as the basic RNN-T, Conformer, and ContextNet.

Another concern with end-to-end ASR models is computational efficiency, especially since the unified neural architecture makes these models very attractive for on-device deployment, where compute cycles and (for mobile devices) power are at a premium.

In their paper “Compute cost amortized Transformer for streaming ASR”, Yi Xie and colleagues exploit the intuitive observation that the amount of computation a model performs should vary as a function of the difficulty of the task; for instance, input in which noise or an accent causes ambiguity may require more computation than a clean input with a mainstream accent. (We may think of this as the ASR model “thinking harder” in places where the words are more difficult to discern.)

Related content
A new approach to determining the “channel configuration” of convolutional neural nets improves accuracy while maintaining runtime efficiency.

The researchers achieve this with a very elegant method that leverages the integrated neural structure of the model. Their starting point is a Transformer-based ASR system, consisting of multiple stacked layers of multiheaded self-attention (MHA) and feed-forward neural blocks. In addition, they train “arbitrator” networks that look at the acoustic input (and, optionally, also at intermediate block outputs) to toggle individual components on or off.

Because these component blocks have “skip connections” that combine their outputs with the outputs of earlier layers, they are effectively optional for the overall computation to proceed. A block that is toggled off for a given input frame saves all the computation normally carried out by that block, producing a zero vector output. The following diagram shows the structure of both the elementary Transformer building block and the arbitrator that controls it:

Arbitrator:Transformer backbone.png
Illustration of the arbitrator and Transformer backbone of each block. The lightweight arbitrator toggles whether to evaluate subcomponents during the forward pass.

The arbitrator networks themselves are small enough that they do not contribute significant additional computation. What makes this scheme workable and effective, however, is that both the Transformer assemblies and the arbitrators that control them can be trained jointly, with dual goals: to perform accurate ASR and to minimize the overall amount of computation. The latter is achieved by adding a term to the training objective function that rewards reducing computation. Dialing a hyperparameter up or down selects the desired balance between accuracy and computation.

Related content
Branching encoder networks make operation more efficient, while “neural diffing” reduces bandwidth requirements for model updates.

The authors show that their method can achieve a 60% reduction in computation with only a minor (3%) increase in ASR error. Their cost-amortized Transformer proves much more effective than a benchmark method that constrains the model to attend only to sliding windows over the input, which yields only 13% savings and an error increase of almost three times as much.

Finally, in this short review of end-to-end neural ASR advances, we look at ways to recognize speech from more than one speaker, while keeping track of who said what (also known as speaker-attributed ASR).

This has traditionally been done with modular systems that perform ASR and, separately, perform speaker diarization, i.e., labeling stretches of audio according to who is speaking. However, here, too, neural models have recently brought advances and simplification, by integrating these two tasks in a single end-to-end neural model.

In their paper “Separator-transducer-segmenter: Streaming recognition and segmentation of multi-party speech”, Ilya Sklyar and colleagues not only integrate ASR and segmentation-by-speaker but do so while processing inputs incrementally. Streaming multispeaker ASR with low latency is a key technology to enable voice assistants to interact with customers in collaborative settings. Sklyar’s system does this with a generalization of the RNN-T architecture that keeps track of turn-taking between multiple speakers, up to two of whom can be active simultaneously. The researchers’ separator-transducer-segmenter model is depicted below:

Separator-transducer-segmenter.png
Separator-transducer-segmenter. The tokens <sot> and <eot> represent the start of turn and end of turn. Model blocks with the same color have tied parameters, and transcripts in the color-matched boxes belong to the same speaker.

A key element that yields improvements over an earlier approach is the use of dedicated tokens to recognize both starts and ends of speaker turns, for what the authors call “start-pointing” and “end-pointing”. (End-pointing is a standard feature of many interactive ASR systems necessary to predict when a talker is done.) Beyond representing the turn-taking structure in this symbolic way, the model is also penalized during training for taking too long to output these markers, in order to improve the latency and temporal accuracy of the outputs.

Fairness in the performance of speech-enabled AI

The second theme we’d like to highlight, and one that is receiving increasing attention in speech and other areas of AI, is performance fairness: the desire to avert large differences in accuracy across different cohorts of users or on content associated with protected groups. As an example, concerns about this type of fairness gained prominence with demonstrations that certain computer vision algorithms performed poorly for certain skin tones, in part due to underrepresentation in the training data.

Related content
The team’s latest research on privacy-preserving machine learning, federated learning, and bias mitigation.

There’s a similar concern about speech-based AI, with speech properties varying widely as a function of speaker background and environment. A balanced representation in training sets is hard to achieve, since the speakers using commercial products are largely self-selected, and speaker attributes are often unavailable for many reasons, privacy among them. This topic is also the subject of a special session at Interspeech, Inclusive and Fair Speech Technologies, which several Alexa AI scientists are involved in as co-organizers and presenters.

One of the special-session papers, “Reducing geographic disparities in automatic speech recognition via elastic weight consolidation”, by Viet Anh Trinh and colleagues, looks at how geographic location within the U.S. affects ASR accuracy and how models can be adapted to narrow the gap for the worst-performing regions. Here and elsewhere, a two-step approach is used: first, subsets of speakers with higher-than-average error rates are identified; then a mitigation step attempts to improve performance for those cohorts. Trinh et al.’s method identifies the cohorts by partitioning the speakers according to their geographic longitude and latitude, using a decision-tree-like algorithm that maximizes the word-error-rate (WER) differences between resulting regions:

Reducing geographical disparities.png
A map of 126 regions identified by the clustering tree. The color does not indicate a specific word error rate (WER), but regions with the same color do have the same WER.

Next, the regions are ranked by their average WERs; data from the highest-error regions is identified for performance improvement. To achieve that, the researchers use fine-tuning to optimize the model parameters for the targeted regions, while also employing a technique called elastic weight consolidation (EWC) to minimize performance degradation on the remaining regions.

This is important to prevent a phenomenon known as “catastrophic forgetting”, in which neural models degrade substantially on prior training data during fine-tuning. The idea is to quantify the influence that different dimensions of the parameter space have on the overall performance and then avoid large variations along those dimensions when adapting to a data subset. This approach decreases the WER mean, maximum, and variance across regions and even the overall WER (including the regions not fine-tuned on), beating out several baseline methods for model adaptation.

Pranav Dheram et al., in their paper “Toward fairness in speech recognition: Discovery and mitigation of performance disparities”, look at alternative methods for identifying underperforming speaker cohorts. One approach is to use human-defined geographic regions as given by postal (a.k.a. zip) codes, in combination with demographic information from U.S. census data, to partition U.S. geography.

Related content
NSF deputy assistant director Erwin Gianchandani on the challenges addressed by funded projects.

Zip codes are sorted into binary partitions by majority demographic attributes, so as to maximize WER discrepancies. The partition with higher WER is then targeted for mitigations, an approach similar to that adopted in the Trinh et al. paper. However, this approach is imprecise (since it lumps together speakers by zip code) and limited to available demographic data, so it generalizes poorly to other geographies.

Alternatively, Dheram et al. use speech characteristics learned by a neural speaker identification model to group speakers. These “speaker embedding vectors” are clustered, reflecting the intuition that speakers who sound similar will tend to have similar ASR difficulty.

Subsequently, these virtual speaker regions (not individual identities) can be ranked by difficulty and targeted for mitigation, without relying on human labeling, grouping, or self-identification of speakers or attributes. As shown in the table below, the automatic approach identifies a larger gap in ASR accuracy than the “geo-demographic” approach, while at the same time targeting a larger share of speakers for performance mitigation:

Cohort discovery

WER gap (%)

Bottom-cohort share (%)

Geodemographic

Automatic

41.7

65.0

0.8

10.0

The final fairness-themed paper we highlight explores yet another approach to avoiding performance disparities, known as adversarial reweighting (ARW). Instead of relying on explicit partitioning of the input space, this approach assigns continuous weights to the training instances (as a function of input features), with the idea that harder examples get higher weights and thereby exert more influence on the performance optimization.

Related content
Method significantly reduces bias while maintaining comparable performance on machine learning tasks.

Secondly, ARW more tightly interleaves, and iterates, the (now weighted) cohort identification and mitigation steps. Mathematically, this is formalized as a min-max optimization algorithm that alternates between maximizing the error by changing the sample weights (hence “adversarial”) and minimizing the weighted verification error by adjusting the target model parameters.

ARW was designed for group fairness in classification and regression tasks that take individual data points as inputs. “Adversarial reweighting for speaker verification fairness”, by Minho Jin et al., looks at how the concept can be applied to a classification task that depends on pairs of input samples, i.e., checking whether two speech samples come from the same speaker. Solving this problem could help make a voice-based assistant more reliable at personalization and other functions that require knowing who is speaking.

The authors look at several ways to adapt ARW to learning similarity among speaker embeddings. The method that ultimately worked best assigns each pair of input samples an adversarial weight that is the sum of individual sample weights (thereby reducing the dimensionality of the weight prediction). The individual sample weights are also informed by which region of the speaker embedding space a sample falls into (as determined by unsupervised k-means clustering, the same technique used in Dheram et al.’s automatic cohort-identification method).

Computing ARW weights.png
Computing adversarial-reweighting (ARW) weights.

I omit the details, but once the pairwise (PW) adversarial weights are formalized in this way, we can insert them into the loss function for metric learning, which is the basis of training a speaker verification model. Min-max optimization can then take turns training the adversary network that predicts the weights and optimizing the speaker embedding extractor that learns speaker similarity.

On a public speaker verification corpus, the resulting system reduced overall equal-error rate by 7.6%, while also reducing the gap between genders by 17%. It also reduced the error variability across different countries of origin, by nearly 10%. Note that, as in the case of the Trinh et al. ASR fairness paper, fairness mitigation improves both performance disparities and overall accuracy.

This concludes our thematic highlights of Alexa Speech Interspeech papers. Note that Interspeech covers much more than speech and speaker recognition. Please check out companion pieces that feature additional work, drawn from technical areas that are no less essential for a functioning speech-enabled AI assistant: natural-language understanding and speech synthesis.

Research areas

Related content

US, TX, Austin
Amazon Leo is an initiative to launch a constellation of Low Earth Orbit satellites that will provide low-latency, high-speed broadband connectivity to unserved and underserved communities around the world. As a Systems Engineer, this role is primarily responsible for the design, development and integration of Ka band and S/C band communication payload and ground terminal systems. The Role: Be part of the team defining the overall communication system and architecture of Amazon’s broadband wireless network. This is a unique opportunity to innovate and define groundbreaking wireless technology with few legacy constraints. The team develops and designs the communication system of Amazon Leo and analyzes its overall system level performance such as for overall throughput, latency, system availability, packet loss etc. This role in particular will be responsible for leading the effort in designing and developing advanced technology and solutions for communication system. This role will also be responsible developing advanced L1/L2 proof of concept HW/SW systems to improve the performance and reliability of the Amazon Leo network. In particular this role will be responsible for using concepts from digital signal processing, information theory, wireless communications to develop novel solutions for achieving ultra-high performance LEO network. This role will also be part of a team and develop simulation tools with particular emphasis on modeling the physical layer aspects such as advanced receiver modeling and abstraction, interference cancellation techniques, FEC abstraction models etc. This role will also play a critical role in the design, integration and verification of various HW and SW sub-systems as a part of system integration and link bring-up and verification. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. Key job responsibilities • Design advanced L1/L2 algorithms and solutions for the Amazon Leo communication system, particularly Multi-User MIMO techniques. • Develop proof-of-concepts for critical communication payload components using SDR platforms consisting of FPGAs and general-purpose processors. • Work with ASIC development teams to build power/area efficient L1/L2 HW accelerators to be integrated into Amazon Leo SoCs. • Provide specifications and work with implementation teams on the development of embedded L1/L2 HW/SW architectures. • Work with multi-disciplinary teams to develop advanced solutions for time, frequency and spatial acquisition/tracking in LEO systems, particularly under large uncertainties. • Develop link-level and system-level simulators and work closely with implementation teams to evaluate expected performance and provide quick feedback on potential improvements. • Develop testbeds consisting of digital, IF and RF components while accounting for link-budgets and RF/IF line-ups. Previous experiences with VSAs/VSGs, channel emulators, antennas (particularly phased-arrays) and anechoic chamber instrumentation are a plus. • Work with development teams on system integration and debugging from PHY to network layer, including interfacing with flight computer and SDN control subsystems. • Willing to work in fast-paced environment and take ownership that goes from algorithm specification, to HW/SW architecture definition, to proof-of-concept development, to testbed bring-up, to integration into the Amazon Leo system. • Be a team player and provide support when requested while being able to unblock themselves by reaching out to RF, ASIC, SW, Comsys and Testbed supporting teams to move forward in development, testing and integration activities. • Ability to adapt design and test activities based on current HW/SW capabilities delivered by the development teams.
US, TX, Austin
Project Leo (former Kuiper) is an initiative to launch a constellation of Low Earth Orbit satellites that will provide low-latency, high-speed broadband connectivity to unserved and underserved communities around the world. As a Systems Engineer, this role is primarily responsible for the design, development and integration of Ka band and FR1 band communication payload and customer terminal systems. The Role: Be part of the team defining the overall communication system and architecture of Amazon Leo’s broadband wireless network. This is a unique opportunity to innovate and define groundbreaking wireless technology at global scale. The team develops and designs the communication system for project Leo and analyzes its overall system level performance such as for overall throughput, latency, system availability, packet loss etc. This role in particular will be responsible for leading the effort in designing and developing advanced technology and solutions for communication system. This role will also be responsible developing advanced physical layer + protocol stacks systems as proof of concept and reference implementation to improve the performance and reliability of the LEO network. In particular this role will be responsible for using concepts from digital signal processing, information theory, wireless communications to develop novel solutions for achieving ultra-high performance LEO network. This role will also be part of a team and develop simulation tools with particular emphasis on modeling the physical layer aspects such as advanced receiver modeling and abstraction, interference cancellation techniques, FEC abstraction models etc. This role will also play a critical role in the integration and verification of various HW and SW sub-systems as a part of system integration and link bring-up and verification. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum.
US, WA, Bellevue
What does it take to build a foundation model that can forecast demand for hundreds of millions of products — including ones that have never been sold before? At Amazon, our Demand Forecasting team is tackling one of the most ambitious challenges in applied time series research: designing and building large-scale foundation models that generalize across an enormous and diverse catalog of products, geographies, and business contexts. This is not incremental modeling work. We are redefining what's possible in demand forecasting through novel architectures, training strategies, and data generation techniques. Our team operates at a scale that is unmatched in industry or academia. You'll design experiments across millions of products simultaneously, developing new model architectures and training methodologies that push the boundaries of what foundation models can learn from vast, heterogeneous time series data. You'll explore techniques in transfer learning, zero-shot forecasting, and synthetic data generation. The models you design here will ship to production and directly influence hundreds of millions of dollars in automated inventory decisions every week. Beyond operational impact, you'll publish your work at top-tier conferences and contribute to advancing the state of the art in time series foundation models for the broader scientific community. If you are a scientist who wants to work at the frontier of time series research, design novel solutions to problems no one else has solved at this scale, and see your research deployed to real-world impact — this is the team for you. Key job responsibilities 1. Design and implement novel deep learning architectures (e.g., Transformers, SSMs, or Graph Neural Networks) for time-series foundation models that generalize across hundreds of millions of products and diverse global contexts. 2. Drive the full development cycle - from whiteboarding new algorithmic approaches to overseeing production-scale deployments. 3. Collaborate with SDEs to build high-performance, distributed training and inference pipelines; translate complex scientific concepts into scalable, production-grade code in Python and Scala. 4. Leverage and develop agentic GenAI workflows to automate the end-to-end research cycle from synthesizing state-of-the-art literature and auto-generating experimental code to rapidly iterating on model architectures across millions of products. 5. Maintain a high bar for scientific excellence by publishing novel research in top-tier venues (e.g., NeurIPS, ICLR, KDD) and contributing to Amazon’s internal patent and science community. A day in the life No two days look the same, but most will involve a high-velocity blend of deep architectural work, distributed system design, and frontier scientific thinking at a scale you won’t find anywhere else. You might start the morning by designing a synthetic data pipeline to stress-test your foundation model. You’ll use generative techniques to simulate rare "black swan" supply chain events, ensuring your model remains robust where historical data is thin. You'll then lead a Scientific Design Review, walking senior leaders through your model’s architecture, defending your choice of loss functions with data-driven rigor. You’ll write high-performance code often paired with AI-coding assistants to handle the heavy lifting of boilerplate and unit testing. You’ll collaborate across a "Two-Pizza Team" of scientists and engineers, pushing the boundaries of research with a clear goal: contributing to work that will be published at top-tier venues (ICLR, NeurIPS) while simultaneously driving multi-million dollar automated decisions. The work is hard, the math is complex, and the tools are state-of-the-art. If you want to build the models that actually ship—this is where you do it. About the team The Demand Forecasting team sits at the heart of Amazon's supply chain, building the science that determines what products are available, when, and at what cost — for hundreds of millions of customers around the world. Our mission is to push the frontier of what's possible in large-scale time series forecasting, and to deploy that science where it creates real, measurable impact. We are a team of scientists who care deeply about both research rigor and real-world outcomes. We don't just publish — we ship. And we don't just ship — we measure, iterate, and raise the bar. Our work spans the full lifecycle: from foundational research and large-scale experimentation to production deployment and downstream impact measurement across supply chain, inventory, and financial planning.
US, WA, Bellevue
Do you enjoy solving challenging problems and driving innovations in research? Are you seeking for an environment with a group of motivated and talented scientists like yourself? Do you want to create scalable optimization models and apply machine learning techniques to guide real-world decisions? Do you want to play a key role in the future of Amazon transportation and operations? Come and join us at Amazon's Modeling and Optimization team (MOP). Key job responsibilities A Research Scientist in the Modeling and Optimization (MOP) team - provides analytical decision support to Amazon planning teams via applying advanced mathematical and statistical techniques. - collaborates effectively with Amazon internal business customers, and is their trusted partner - is proactive and autonomous in discovering and resolving business pain-points within a given scope - is able to identify a suitable level of sophistication in resolving the different business needs - is confident in leveraging existing solutions to new problems where appropriate and is independent in designing and implementing new solutions where needed - is aware of the limitations of their proposed solutions and is proactive in communicating them to the business, and advances the application of sciences towards Amazon business problems by bringing new methods, ideas, and practices to the team and scientific community. A day in the life - Your will be developing model-based optimization, simulation, and/or predictive tools to identify and evaluate opportunities to improve customer experience, network speed, cost, and efficiency of capital investment. - You will quantify the improvements resulting from the application of these tools and you will evaluate the trade-offs between potentially competing objectives. - You will develop good communication skills and ability to speak at a level appropriate for the audience, will collaborate effectively with fellow scientists, software development engineers, and product managers, and will deliver business value in a close partnership with many stakeholders from operations, finance, IT, and business leadership. About the team - At the Modeling and Optimization (MOP) team, we use mathematical optimization, algorithm design, statistics, and machine learning to improve decision-making capabilities across WW Operations and Amazon Logistics. - We focus on transportation topology, labor and resource planning for fulfillment facilities, routing science, visualization research, data science and development, and process optimization. - We create models to simulate, optimize, and control the fulfillment network with the objective of reducing cost while improving speed and reliability. - We support multiple business lanes, therefore maintain a comprehensive and objective view, coordinating solutions across organizational lines where possible.
US, NJ, Jersey City
MULTIPLE POSITIONS AVAILABLE Employer: AMAZON WEB SERVICES, INC. Offered Position: Economist III Job Location: Jersey City, New Jersey Job Number: AMZ9674161 Position Responsibilities: Work with the chief economist and senior management on key business problems faced in retail, international retail, cloud computing, third party merchants, search, Kindle, streaming video, or operations. Apply the frontier of economic thinking to market design, pricing, forecasting, program evaluation, online advertising, and other areas. Build econometric models using data systems. Apply economic theory to solve business problems. Develop new techniques to process large data sets, address quantitative problems, and contribute to design of automated systems. Apply tools from applied micro-econometrics (e.g. experimental design, difference-in-difference, regression discontinuity, and IV) and forecasting (essential time series models). Leverage big data tools for data extraction. Write up and present analysis for distribution to various levels of management at Amazon. Gain experience in academic research. Use program evaluation, forecasting, time series, panel data, and high dimensional problems. Use R and Stata. Position Requirements: Ph.D. or foreign equivalent degree in Economics, Finance, or a related field and three years of research or work experience in the job offered or a related occupation. Must have at least one year of research or work experience in the following skill(s): (1) working with Causal inference techniques (Difference-in-Differences, Matching, Double Machine Learning, Instrumental Variables, and Regression Discontinuity Designs); (2) statistical analysis tools (Python, R or Stata); (3) Data querying languages (SQL). Amazon.com is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation. 40 hours / week, 8:00am-5:00pm, Salary Range $175,100/year to $236,900/year. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, visit: https://www.aboutamazon.com/workplace/employee-benefits.#0000
US, NY, New York
MULTIPLE POSITIONS AVAILABLE Employer: AMAZON.COM SERVICES LLC Offered Position: Manager III, Economist Job Location: New York, New York Job Number: AMZ9782156 Position Responsibilities: Support the measurement of the Alexa business and provide actionable insights across Alexa customers and devices. Work with product managers, SDEs, financial analysts, and BIEs to help the Alexa organization identify new features and business opportunities as well as drive optimization of current features and services through your analyses as the technical lead on the team. Own the development of econometric models, and manage the modelling and validation work for analysis products. Design and develop Econometric models to solve business problems and improve customer CX. Develop techniques to process large datasets, address quantitative problems, and contribute to design of automated systems around the company. Write high quality code and participating in Econ tech reviews, work with the business stakeholders to understand and solve their business problems by applying the frontier of economic thinking. Mentor and support junior Economists and scientists. Position Requirements: PhD degree or foreign equivalent in Economics, Computer Science, or related field and five years of research or work experience in the job offered or related occupation. Must have one year of research or work experience in the following skill(s): experience with casual inference and predictive modeling; experience in econometrics (program evaluation, forecasting, time series, panel data, and high dimensional problems); and experience with economic theory and quantitative methods. Amazon.com is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation. 40 hours / week, 8:00am-5:00pm, Salary Range $226,782/year to $260,500/year. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, visit: https://www.aboutamazon.com/workplace/employee-benefits.#0000
US, NJ, Newark
At Audible, we believe stories have the power to transform lives. It’s why we work with some of the world’s leading creators to produce and share audio storytelling with our millions of global listeners. We are dreamers and inventors who come from a wide range of backgrounds and experiences to empower and inspire each other. Imagine your future with us. ABOUT THIS ROLE We are seeking a Data Scientist to own our causal inference infrastructure and drive sophisticated modeling that measures the incremental impact of business decisions. This role requires deep expertise in advanced causal inference methodologies—including synthetic control methods, Synthetic Difference-in-Differences (SDID), and Bayesian approaches—to design rigorous experiments, estimate long-term customer behavior effects, and translate complex analytical results into clear business recommendations. You will own the development and continuous improvement of these causal inference models while being responsible for machine learning operations at scale to ensure our organization makes data-driven decisions with confidence. At Audible, you will have an opportunity to make the best of your skillsets to both develop advanced scientific solutions and drive critical customer and business impact. You will play a key role to drive end-to-end solutions from understanding our business and business requirements, identifying opportunities from a large amount of historical data and engaging in research to solve the business problems. You'll seek to create value for both stakeholders and customers and inform findings in a clear, actionable way to managers and senior leaders. You will be at the heart of an agile and growing area at Audible. ABOUT THE TEAM Audible Data Scientists are members of a global interdisciplinary insights and research team with an integral role in the design and integration of models to automate decision making throughout the business in every country. We empower the machine learning and deep learning techniques in many areas of the business. We translate business goals into agile, insightful analytics and seek to create value for both stakeholders and customers and convey findings in a clear, actionable way to managers and senior leaders. As a Data Scientist, you will... - Design and execute geo-level randomized experiments to measure incremental impact - Apply statistical techniques to evaluate causal impact in quasi-experimental settings - Ensure experiments are statistically valid by evaluating sampling strategies, statistical power, and potential sources of bias - Develop models that estimate long-term effects from short-term experiments using machine learning - Estimate how changes in customer behavior persist and decay over time - Own and maintain the geo-testing codebase, including deployment and scalability - Implement machine learning models at scale with focus on performance optimization - Partner with stakeholders to ensure models align with real business dynamics - Engage deeply with business problems through curiosity-driven questioning and brainstorming - Translate experimental results into financial impact and investment recommendations - Analyze marginal and average revenue impacts relative to costs - Communicate complex quantitative ideas clearly to non-technical stakeholders - Demonstrate understanding of Audible's business model and customer experience ABOUT AUDIBLE Audible is the leading producer and provider of audio storytelling. We spark listeners’ imaginations, offering immersive, cinematic experiences full of inspiration and insight to enrich our customers daily lives. We are a global company with an entrepreneurial spirit. We are dreamers and inventors who are passionate about the positive impact Audible can make for our customers and our neighbors. This spirit courses throughout Audible, supporting a culture of creativity and inclusion built on our People Principles and our mission to build more equitable communities in the cities we call home.
US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video subscriptions such as Apple TV+, HBO Max, Peacock, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video team member, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! Key job responsibilities We are looking for passionate, hard-working, and talented individuals to help us push the envelope of content localization. We work on a broad array of research areas and applications, including but not limited to multimodal machine translation, speech synthesis, speech analysis, and asset quality assessment. Candidates should be prepared to help drive innovation in one or more areas of machine learning, audio processing, and natural language understanding. The ideal candidate would have experience in audio processing, natural language understanding and machine learning. Familiarity with machine translation, foundational models, and speech synthesis will be a plus. As an Applied Scientist, you should be a strong communicator, able to describe scientifically rigorous work to business stakeholders of varying levels of technical sophistication. You will closely partner with the solution development teams, and should be intensely curious about how the research is moving the needle for business. Strong inter-personal and mentoring skills to develop applied science talent in the team is another important requirement.
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Science Manager with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to lead a team ensuring the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Science Manager will lead and mentor a team of Applied Scientists who develop comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. The manager will guide the team in designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that align with core scientist team developing Amazon Nova models. The Applied Science Manager will oversee expert-level manual audits, meta-audits to evaluate auditor performance, and provide coaching to uplift overall quality capabilities across the team. The manager will lead research in areas related to HIL data impact to LLM models, and define utility measurement strategies for data generated by AGI-DS for Nova models. The Applied Science Manager will be responsible for recruiting, hiring, and developing team members, conducting performance reviews, setting clear expectations and growth plans, and fostering a culture of scientific excellence and innovation. The manager will communicate with senior leadership, cross-functional technical teams, and customers to collect requirements, describe product features and technical designs, and articulate product strategy. A day in the life An Applied Science Manager with the AGI team will lead quality solution design, guide root cause analysis on data quality issues, drive research into new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. The manager will work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice. The manager will also conduct regular 1:1s with team members, provide mentorship and coaching, and ensure the team delivers high-impact results aligned with organizational goals.
US, WA, Seattle
The GRAISE team (Grocery, Retail & In-Store Experience) within Worldwide Grocery Store Tech (WWGST) builds foundational AI and machine learning systems that power Amazon's in-store grocery technologies. We develop domain-specific models that solve uniquely complex challenges in grocery — from smart shopping carts and inventory intelligence to personalization and store operations. Our mission is to create technology which makes grocery shopping more convenient, economical, personalized, and enjoyable for customers while empowering retailers with operational efficiency. We are looking for a talented and motivated Applied Scientist to join our team. In this role, you will design, develop, and deploy machine learning and computer vision models and algorithms that solve real-world problems at scale. You will work closely with engineering, product, and business teams to translate ambiguous problems into rigorous scientific solutions, and you will own the end-to-end development of models from ideation through production. This is a high-impact role where your work will directly shape the intelligence layer of Amazon's grocery ecosystem. Key job responsibilities - Design and implement machine learning models to solve complex grocery-domain problems. - Conduct exploratory data analysis and develop deep understanding of domain-specific data challenges. - Collaborate with software engineers to productionize models and ensure reliability at scale. - Define and track key metrics to evaluate model performance and business impact. - Communicate findings and recommendations clearly to technical and non-technical stakeholders. - Stay current with the latest research and evaluate applicability to team problems. - Contribute to a culture of scientific rigor, experimentation, and continuous improvement. A day in the life As an Applied Scientist on the GRAISE team, you'll spend your days analyzing model performance from overnight experiments, collaborating with engineers to deploy computer vision models to production, and prototyping new approaches using multimodal learning with store video and sensor data. You'll present findings to product and business stakeholders, translating technical results into actionable recommendations. Throughout the day, you'll balance rigorous scientific thinking with practical engineering constraints, knowing your work directly improves the shopping experience for millions of customers in Amazon grocery stores.