On-device speech processing makes Alexa faster, lower-bandwidth

Innovative training methods and model compression techniques combine with clever engineering to keep speech processing local.

At Amazon, we always look to invent new technology for improving customer experience. One technology we have been working on at Alexa is on-device speech processing, which has multiple benefits: a reduction in latency, or the time it takes Alexa to respond to queries; lowered bandwidth consumption, which is important on portable devices; and increased availability in in-car units and other applications where Internet connectivity is intermittent. On-device processing also enables the fusion of the speech signal with other modalities, like vision, for features such as Alexa’s natural turn-taking.

In the last year, we’ve continued to build upon Alexa’s on-device speech-processing capabilities. As a result of these inventions, we are launching a new setting that gives customers the option of having the audio of their Alexa voice requests processed locally, without being sent to the cloud.

In the cloud, storage space and computational capacity are effectively unconstrained. To ensure accuracy, our cloud models can be large and computationally demanding. Executing the same functions on-device means compressing our models into less than 1% as much space — with minimal loss in accuracy.

Moreover, in the cloud, the separate components of Alexa’s speech-processing stack — automatic speech recognition (ASR), whisper detection, and speaker identification — run on separate server nodes with their own powerful processors. On-device, those functions have to share hardware not only with each other but with Alexa’s other core device functions, such as music playback.

Re-creating Alexa’s speech-processing stack on-device was a massive undertaking. New methods for training small-footprint ASR models were part of the solution, but so were innovations in system design and hardware-software codesign. It was a joint effort across science and engineering teams over a span of years. Here’s a quick overview of how it works.

System architecture

Our on-device ASR model takes in an acoustic speech signal and outputs a set of hypotheses about what the speaker said, ranked according to probability. We represent those hypotheses as a lattice — a graph whose edges represent recognized words and the probability that a given word follows from the previous one.

Sample lattice.cropped.png
An example of a lattice representing ASR hypotheses.

With cloud-based ASR, encrypted audio streams to the cloud in small snippets called “frames”. With on-device ASR, only the lattice is sent to the cloud, where a large and powerful neural language model reranks the hypotheses. The lattice can’t be sent until the customer has finished speaking, as words later in a sequence can dramatically change the overall probability of a hypothesis.

The model that determines when the customer has finished speaking is called an end-pointer. End-pointers offer a natural trade-off between accuracy and latency: an aggressive end-pointer will initiate speech processing earlier, but it might cut the speaker off prematurely, resulting in a poor customer experience.

On the device, we in fact run two end-pointers: One is a speculative end-pointer that we have tuned to be about 200 milliseconds faster than the final end-pointer, so we can initiate downstream processing — such as natural-language understanding (NLU) — ahead of the final end-pointed ASR result. In exchange for speed, however, we trade off a little accuracy.

The final end-pointer takes longer to make a decision but is more accurate. In cases in which the first end-pointer cuts speech off too early, the final end-pointer sends a revised lattice and the instruction to reset downstream processing. In the large majority of cases, however, the aggressive end-pointer is correct, which reduces user-perceived latency, since downstream tasks are initiated earlier.

Another aspect of ASR that had to move on-device is context awareness. When computing the probabilities in a lattice, the ASR model should, for instance, give added weight to otherwise uncommon names that happen to be in the customer’s address book or the names the customer has assigned to household devices.

AmazonScience_StaticGraphic
A diagram of the on-device ASR network, with a closeup of the biasing mechanism that allows the network to ingest dynamic content. (Based on figures in "Context-aware Transformer transducer for speech recognition")
Attention map.png
This attention map indicates that the trained network is attending to the correct entry in a list of Alexa-linked home appliances. (From "Context-aware Transformer transducer for speech recognition")

Context awareness can’t wait for the cloud because the lattice, though it encodes multiple hypotheses, doesn’t come close to encoding all possible hypotheses. When constructing the lattice, the ASR system has to prune a lot of low-probability hypotheses. If context awareness isn’t built into the on-device model, names of contacts or linked skills might end up getting pruned.

Initially, we use a so-called shallow-fusion model to add context and personalize content on-device. When the system is building the lattice, it boosts the probabilities of contextually relevant words such as contact or appliance names.

The probability boosts are heuristic, however — they’re not learned jointly with the core ASR model. To achieve even better accuracy on personalized and long-tail content, we have developed a multihead attention-based context-biasing mechanism that is jointly trained with the rest of the ASR subnetworks.

Model training

On-device ASR required us to build a new model from the ground up, an end-to-end recurrent neural network-transducer (RNN-T) model that directly maps the input speech signal to an output sequence of words. Using a single neural network results in a significantly reduced memory footprint. But we had to develop new techniques, both for inference and for training, to achieve the degree of accuracy and compression that would let this technology handle utterances on-device.

Previously on Amazon Science, we’ve discussed some of the techniques we used to increase the accuracy of small-footprint end-to-end ASR models. With teacher-student training, for instance, we teach a small, lean model to match the outputs of a more-powerful but slower model. We developed a training methodology that made it possible to do teacher-student training efficiently with a million hours of unannotated speech.

Stream-level context.png
During the training of a context-aware ASR model, a long-short-term-memory (LSTM) encoder encodes both unlabeled and labeled segments of the audio stream, so the model can use the entire input audio to improve ASR accuracy. (From "Improving RNN-T ASR accuracy using context audio")

To further boost the accuracy of on-device RNN-T ASR, we developed techniques that allow the neural network to learn and exploit audio context within a stream. For example, for a stream comprising two utterances, “Alexa” and “Play a song”, the audio context from the keyword segment (“Alexa”) helps the model focus on the foreground speech and speaker. Separately, we implemented a novel discriminative-loss and training algorithm that aims at directly minimizing the word error rate (WER) of RNN-T ASR.

On top of these innovations, however, we still had to develop some new compression techniques to get the RNN-T to run efficiently on-device. A neural network consists of simple processing nodes each of which is connected to several others. The connections between nodes have associated weights, which determine how much one node’s output contributes to the computation performed by the next node.

One way to shrink a neural network’s memory footprint is to quantize its weights — to divide the total range of weights into a small set of intervals and use a single value to represent all the weights in each interval. So, for instance, the weights 0.70, 0.76, and 0.79 might all get quantized to the single value 0.75. Specifying an interval requires fewer bits than specifying several different floating-point values.

If quantization is done after a network has been trained, performance can suffer. We developed a method of <i class="rte2-style-italic">quantization-aware</i> training that imposes a probability distribution on the network weights during training, so that they can be easily quantized with little effect on performance. Unlike previous quantization-aware training methods, which mostly take quantization into account in the forward pass, ours accounts for quantization in the backward direction, during weight updates, through network loss regularization. And it does that efficiently.

A way to make neural networks run more efficiently — also a vital concern on resource-constrained devices — is to reduce low weights to zero. Computations involving zero weights can be discarded, reducing the computational burden.

Sparsification.png
Over successive training epochs, sparsification gradually drops low weights in a weight matrix.

But again, doing that reduction after the network is trained can compromise performance. We developed a <i class="rte2-style-italic">sparsification</i> method that enables the gradual reduction of low-value weights during training, so the network learns a model amenable to weight pruning.

Neural networks are typically trained on multiple passes through the same set of training data, or epochs. During each epoch, we force the network weights to diverge more and more, so that at the end of the final epoch, a fixed number of weights — say, half — are effectively zero. They can be safely discarded.

AmazonScience_AmnetDemo_V1.gif
A demonstration of the branching encoder network.

To improve on-device efficiency, we also developed a branching encoder network that uses two different neural networks to convert speech inputs into numeric representations suitable for speech classification. One network is complex, one simple, and the ASR model decides on the fly whether it can get away with passing an input frame to the simple model, saving computational cost and time. We described this work in more detail in an earlier Amazon Science blog post.

Hardware-software codesign

Quantization and sparsification make no difference to performance if the underlying hardware can’t take advantage of them. Another key to getting ASR to run on-device was the design of Amazon’s AZ family of neural edge processors, which are optimized for our specific approach to compression.

For one thing, where a typical processor might represent data using 16 or 32 bits, for certain core operations, the AZ processors accelerate computation by using an 8-bit or even lower-bit representation, because that’s all we need to handle quantized values.

The weights of a neural network are typically represented using a matrix — a big grid of numbers. A matrix half of whose values are zeroes takes up as much space as a matrix that’s all nonzero.

On computer chips, transferring data tends to be much more time consuming than executing computations. So when we load our matrix into memory, we use a compression scheme that takes advantage of low-bit quantization and zero values. The circuitry for decoding the compressed representation is built into the chip.

In the neural processor’s memory, the matrix is reconstituted: the zeroes are filled back in. But the processor’s circuitry is designed to recognize zero values and discard computations involving them. So the time savings from sparsification are realized in the hardware itself.

Moving speech recognition on device entails a number of innovations in other areas, such as reduction in the bandwidth required for model updates and compression of NLU models, to ensure basic functionality on devices with intermittent Internet connectivity. And we’re also hard at work on multilingual on-device ASR models for dynamic language switching, or automatically recognizing which of two languages a customer is speaking and responding in kind.

The launch of on-device speech processing is a huge step in bringing the benefits of “processing on the edge” to our customers, and we will continue to invent on their behalf in this area.

Research areas

Related content

US, WA, Bellevue
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
GB, London
As a STRUC Economist Intern, you'll specialize in structural econometric analysis to estimate fundamental preferences and strategic effects in complex business environments. Your responsibilities include: Analyze large-scale datasets using structural econometric techniques to solve complex business challenges Applying discrete choice models and methods, including logistic regression family models (such as BLP, nested logit) and models with alternative distributional assumptions Utilizing advanced structural methods including dynamic models of customer or firm decisions over time, applied game theory (entry and exit of firms), auction models, and labor market models Building datasets and performing data analysis at scale Collaborating with economists, scientists, and business leaders to develop data-driven insights and strategic recommendations Tackling diverse challenges including pricing analysis, competition modeling, strategic behavior estimation, contract design, and marketing strategy optimization Helping business partners formalize and estimate business objectives to drive optimal decision-making and customer value Build and refine comprehensive datasets for in-depth structural economic analysis Present complex analytical findings to business leaders and stakeholders
US, WA, Seattle
At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying experience for customers worldwide so they can find, discover, and buy any product they want. We innovate on behalf of our customers to ensure uniqueness and consistency of product identity and to infer relationships between products in Amazon Catalog to drive the selection gateway for the search and browse experiences on the website. We're solving a fundamental AI challenge: establishing product identity and relationships at unprecedented scale. Using Generative AI, Visual Language Models (VLMs), and multimodal reasoning, we determine what makes each product unique and how products relate to one another across Amazon's catalog. The scale is staggering: billions of products, petabytes of multimodal data, millions of sellers, dozens of languages, and infinite product diversity—from electronics to groceries to digital content. The research challenges are immense. GenAI and VLMs hold transformative promise for catalog understanding, but we operate where traditional methods fail: ambiguous problem spaces, incomplete and noisy data, inherent uncertainty, reasoning across both images and textual data, and explaining decisions at scale. Establishing product identities and groupings requires sophisticated models that reason across text, images, and structured data—while maintaining accuracy and trust for high-stakes business decisions affecting millions of customers daily. Amazon's Item and Relationship Platform group is looking for an innovative and customer-focused applied scientist to help us make the world's best product catalog even better. In this role, you will partner with technology and business leaders to build new state-of-the-art algorithms, models, and services to infer product-to-product relationships that matter to our customers. You will pioneer advanced GenAI solutions that power next-generation agentic shopping experiences, working in a collaborative environment where you can experiment with massive data from the world's largest product catalog, tackle problems at the frontier of AI research, rapidly implement and deploy your algorithmic ideas at scale, across millions of customers. Key job responsibilities Key job responsibilities include: * Formulate open research problems at the intersection of GenAI, multimodal reasoning, and large-scale information retrieval—defining the scientific questions that transform ambiguous, real-world catalog challenges into publishable, high-impact research * Push the boundaries of VLMs, foundation models, and agentic architectures by designing novel approaches to product identity, relationship inference, and catalog understanding—where the problem complexity (billions of products, multimodal signals, inherent ambiguity) demands methods that don't yet exist * Advance the science of efficient model deployment—developing distillation, compression, and LLM/VLM serving optimization strategies that preserve frontier-level multimodal reasoning in compact, production-grade architectures while dramatically reducing latency, cost, and infrastructure footprint at billion-product scale * Make frontier models reliable—advancing uncertainty calibration, confidence estimation, and interpretability methods so that frontier-scale GenAI systems can be trusted for autonomous catalog decisions impacting millions of customers daily * Own the full research lifecycle from problem formulation through production deployment—designing rigorous experiments over petabytes of multimodal data, iterating on ideas rapidly, and seeing your research directly improve the shopping experience for hundreds of millions of customers * Shape the team's research vision by defining technical roadmaps that balance foundational scientific inquiry with measurable product impact * Mentor scientists and engineers on advanced ML techniques, experimental design, and scientific rigor—building deep organizational capability in GenAI and multimodal AI * Represent the team in the broader science community—publishing findings, delivering tech talks, and staying at the forefront of GenAI, VLM, and agentic system research