Fitzgerald keynote.png
Amazon senior applied scientist Jack FitzGerald, delivering a keynote talk at the joint Language Intelligence @ Work and SEMANTiCS conference in Vienna, Austria.

Scaling multilingual virtual assistants to 1,000 languages

Self-supervised training, distributed training, and knowledge distillation have delivered remarkable results, but they’re just the tip of the iceberg.

Yesterday at the joint Language Intelligence @ Work and SEMANTiCS conference in Vienna, Austria, Amazon senior applied scientist Jack FitzGerald delivered a keynote talk on multilingual virtual assistants and the path toward a massively multilingual future. This is an edited version of his talk.

The evolution of human-computer interaction paradigms

In the past 50 years, computing technology has progressed from text-based terminal inputs, to graphical user interfaces, to predominantly web-based applications, through the mobile era, and finally into the era of a voice user interface and ambient computing.

Interface timeline.png
A brief history of computing interfaces.

Each of these paradigms has its own challenges with respect to multilingualism, whether it was the migration from ASCII to Unicode or proper character rendering on a website. However, I would argue that a voice AI system is the most difficult paradigm yet with respect to massive multilingualism.

The first reason is that the input space for voice interface commands is unbounded: the user can phrase each command in hundreds of different ways, all of which are valid. Another reason is that even within a single language, there can be many different dialects and accents.

Related content
Amazon Visiting Academic Barbara Poblete helps to build safer, more-diverse online communities — and to aid disaster response.

Most important, the coupling between language and culture is inescapable. Whether it’s the level of formality used, preferred activities, or religious differences, there isn’t a one-size-fits-all solution. Instead, we must adapt the virtual assistant to understand cultural context and say only things that are appropriate for a given locale.

Voice AI systems today

A typical voice AI system includes automatic-speech-recognition models, which convert raw audio into text; natural-language understanding models, which determine the user’s intent and recognize named entities; a central service for arbitration and dialogue management, which routes commands to the proper services or skills; and finally, a text-to-speech model, which issues the output. Additional tasks might include expansion of the underlying knowledge graph and semantic parsing, localization of touch screen content, or local information services.

Alexa overview.png
An overview of Alexa’s design.

Let’s look at some of the operational considerations for supporting multiple languages in such models. One is the training data: they must be topically exhaustive, meaning that they cover the full spectrum of possible user utterances, and they must be culturally exhaustive — for instance, covering all of the holidays a user might celebrate. They must also remain up-to-date, and it’s not always easy to add something new to the model without regression on existing functionalities.

A second consideration is in-house testing. Though in many cases one can get away with synthetic or otherwise artificial data for model training, for testing it’s important to have realistic utterances. Those typically need to come from humans, and collecting them can be a major expense. It’s also useful to perform live, interactive testing, which requires people who can speak and understand each language that the system supports.

Related content
New approach corrects for cases when average improvements are accompanied by specific regressions.

Finally, it’s important to have the ability to support users and process their feedback. In most cases, this again requires staff who understand each of the supported languages.

Ultimately, human-based processes are not very scalable if our goal is to support thousands of languages. Instead, we must turn to technology to the greatest extent possible.

Multilingual modeling today

One of the leading reasons for the current success of multilingual text models is self-supervision.

In traditional supervised learning, a model would be trained from scratch on the desired task. If we wanted a model that would classify the sentiment of a product review, for example, we would manually annotate a bunch of product reviews, and we would use that dataset to train the model.

Today, however, we make use of transfer learning, in which text models are pretrained on terabytes of text data that don’t require manual annotation. Instead, the training procedure leverages the structure inherent to the text itself.

Self-supervision signals.png
Self-supervised-training objectives.

We’ll call this self-supervised pretraining With the masked-language-modeling training objective, for instance, the model is fed the input “for [MASK] out loud!”, and it must predict that “[MASK]” should be filled with the word “crying”. Other objectives, such as causal language modeling, span filling, deshuffling, and denoising can also be used.

Because the datasets required for self-supervised pretraining are unlabeled and monolingual, we can leverage troves of data, such as Common Crawl web scrapes, every Wikipedia page in existence, thousands of books and news articles, and more. Couple these large datasets with highly parallelizable architectures such as transformers, which can be trained on over a thousand GPUs with near linear scaling, and we can build models with tens or hundreds of billions of dense parameters. Such has been the focus for many people in the field for the past few years, including the Alexa Teacher Model team.

One incredible consequence of the transfer learning paradigm is called zero-shot learning. In the context of multilingual modeling, it works like this: the modeler begins by pretraining the model on some set of languages, using self-supervision. As an example, suppose that the modeler trains a model on English, French, and Japanese using every Wikipedia article in those three languages.

Related content
New end-to-end approach to zero-shot video classification dramatically outperforms predecessors.

The next step is to adapt the model to a particular task using labeled data. Suppose that the modeler has a labeled dataset for intent classification, but only in English. The modeler can go ahead and fine-tune the model on the English data, then run it on the remaining languages.

Despite the fact that the model was never trained to do intent classification with French or Japanese data, it can still classify intents in those languages, by leveraging what it learned about those languages during pretraining. Given that the acquisition of labeled data is often a bottleneck, this property of language models is highly valuable for language expansion. Of course, zero-shot learning is just the extreme end of a continuum: transfer learning helps even out performance when the labeled data in different languages is imbalanced.

Zero-shot multilingual.png
Zero-shot learning for multilingual adaptation.

The next step up the data efficiency ladder is performing tasks without any additional training or fine tuning, using only a couple of labeled records or none at all. This is possible through “in-context learning,” which was popularized in the GPT-3 paper.

To perform in-context learning, simply take a pretrained model and feed it the appropriate prompts. Think of a prompt is a hint to the model about the task it should perform. Suppose that we want the model to summarize a passage. We might prefix the passage with the word “Passage” and a colon and follow it with the word “Summary” and a colon. The model would then generate a summary of the passage.

Related content
In the past few years, advances in artificial intelligence have captured our imaginations and led to the widespread use of voice services on our phones and in our homes.

This is the zero-shot in-context learning case, meaning that no fine-tuning is performed, and no labeled data are needed. To improve task performance, we can feed a few examples to the model before asking it to perform the task. Though this does require some labeled data, the amount is small, usually in the tens of examples only.

Our Alexa Teacher Model team recently trained and tested a 20-billion-parameter sequence-to-sequence model that was multilingual and showed nice performance for in-context learning. For example, we showed state-of-the-art performance on machine translation with in-context learning. The model can achieve competitive BLEU scores even for some low-resource languages, which is incredible given that no parallel data was used during pretraining, and no labeled data besides a single example was used at any step in the process.

We were particularly proud of the relatively small size of this model, which could compete with much larger models because it was trained on more data. (The Chinchilla model from OpenAI showed a similar result.) Though a large model trained on a smaller dataset and a smaller model trained on a larger dataset may use the same total compute at training time, the smaller model will require less compute and memory during inference, which is a key factor in real applications.

Given that models demonstrate multilingual understanding even without labeled data or parallel data, you may be wondering what’s happening inside of the model. Since the days of word2vec and earlier, we’ve represented characters, words, sentences, documents, and other inputs as vectors of floats, also known as embeddings, hidden states, and representations. Concepts cluster in certain areas of the representational space.

Related content
Training a product discovery system on many languages at once improves performance in all of them.

As humans, we can think only in three dimensions, whereas these representations are high-dimensional, but you can visualize this clustering in two or three dimensions as a reductive approximation. All the languages the model supports would cluster the concept of sitting in a chair in one region of the representational space; the concept of the ocean would inhabit a different cluster; and so forth.

Indeed, Pires et al. have shown that synonymous words across languages cluster together in the mBERT model. When examining 5,000 sentence pairs from the WMT16 dataset, they found that, given a sentence and its embedding in one language, the correct translation from another language is the closest embedding to the source embedding up to 75% of the time.

This manner of clustering can also be manipulated by changing the objective function. In their work on speech-to-text-modeling, Adams et al., from Johns Hopkins, were seeing undesirable clustering by language, rather than by phonemes, in the representational space. They were able to correct by adding training objectives around phoneme prediction and language identification.

The Alexa Teacher Model distillation pipeline

Once we have multilingual models, how do we adapt them to a real system? At the recent KDD conference, we presented a paper describing the Alexa Teacher Model pipeline, consisting of the following steps.

First, a multilingual model with billions of parameters is trained on up to a trillion tokens taken from Common Crawl web scrapes, Wikipedia articles, and more. Second, the models are further trained on in-domain, unlabeled data from a real system. Third, the model is distilled into smaller sizes that can be used in production. The final models can then be fine-tuned using labeled data and deployed.

ATM pipeline.png
The Alexa Teacher Model (AlexaTM) pipeline. The Alexa Teacher Model is trained on a large set of GPUs (left), then distilled into smaller variants (center), whose size depends on their uses. The end user adapts a distilled model to its particular use by fine-tuning it on in-domain data (right).

In tests, we found that our model was more accurate than a publicly available pretrained model fine-tuned on labeled data, and it significantly reduced customer dissatisfaction relative to a model trained by a smaller teacher model (85 million parameters, say, instead of billions). In short, we’ve verified that we can leverage the additional learning capacity of large, multilingual models for production systems requiring low latency and low memory consumption.

Scaling to 1,000 languages

I mentioned the fascinating ability of language models to learn joint representations of multiple languages without labeled or parallel data. This ability is crucial for us to scale to many languages. However, as we scale, we need test data that we can trust so that we can evaluate our progress.

Related content
MASSIVE dataset and Massively Multilingual NLU (MMNLU-22) competition and workshop will help researchers scale natural-language-understanding technology to every language on Earth.

Toward this end, my team at Amazon recently released a new benchmark for multilingual natural-language understanding called MASSIVE, which is composed of one million labeled records spanning 51 languages, 18 domains, 60 intents, and 55 slots. All of the data were created by native speakers of the languages. We also released a GitHub repository with code that can be used as a baseline for creating multilingual NLU models, as well as leaderboards on eval.ai.

Now, you may retort that 51 languages is still a long ways from 1,000 languages. This is true, but we purposefully chose our languages in order to maximize typological diversity while staying within our budget. Our languages span 29 language genera, 14 language families, and 21 distinct scripts or alphabets. The diversity of the chosen languages allows a modeler to test technology that should scale to many more languages within each represented genus, family, and script.

That said, we certainly have some major gaps in language coverage, including across native North and South American languages, African languages, and Australian languages. Yet we are optimistic that our fellow researchers across the field will continue to produce new labeled benchmark datasets for the world’s thousands of low-resource languages.

Massive languages.cropped.png
The 51 languages of MASSIVE, including scripts and genera.

Another difficulty with our current modeling approaches is that they rely on data sources such as web scrapes, encyclopedic articles, and news articles, which are highly skewed toward a small set of languages. Wang, Ruder, and Neubig recently presented some fascinating work leveraging bilingual lexicons — corpora consisting of word-level translations — to improve language model performance for low-resource languages. Lexicons cover a far greater portion of the world’s languages than our typical data sources for language modeling, making this an exciting approach.

Related content
Self-learning system uses customers’ rephrased requests as implicit error signals.

Researchers, missionaries, and businesspeople have been created fundamental linguistic resources for decades, from Bible translations to the Unimorph corpus. The Unimorph datasets are used for the SIGMORPHON shared task, in which a model must predict the correct formulation of word given that word’s root and certain morphological transformations, such as part of speech, tense, and person. We must find more ways to leverage such resources when creating massively multilingual voice AI systems.

As a final technique for scaling to many more languages, we can consider what we in Alexa call “self-learning.” Some of my Alexa colleagues published a paper showing that we can mine past utterances to improve overall system performance. For example, if a user rephrases a request as part of a multiturn interaction, as shown on the left in the figure below, or if different users provide variations for the same desired goal, as shown on the right, then we can make soft assumptions that the different formulations are synonymous.

All of these cases can be statistically aggregated to form new training sets to update the system, without the need to manually annotate utterances. In a multilingual system, such technology is particularly valuable after the initial launch of a language, both to improve performance generally and to adapt to changes in the lexicon.

Self-learning.png
Alexa’s self-learning mechanism.

The road ahead

I hope that you share my wonder at the current state of the art — the scale of language-model training, the magic of zero-shot learning, and the distillation of knowledge into compact models that can run in latency-sensitive systems. All of this is incredible, but we’ve only scratched the surface of supporting the world’s 7,000 languages.

To move into the next era of massive multilingualism, we must build new and increasingly powerful models that can take advantage of low-cost data, particularly unlabeled monolingual data. We must also build models that can leverage existing and upcoming linguistic resources, such as bilingual lexicons and morphological-transformation databases. And finally, we must expand available language resources across more languages and domains, including more unlabeled monolingual corpora, more parallel resources, and more realistic, labeled, task-specific datasets.

Increased multilingualism is a win for all people everywhere. Each language provides a unique perspective on the world in which we live. A rich plurality of perspectives leads to a deeper understanding of our fellow people and of all creation.

Keep building.

Research areas

Related content

US, CA, San Francisco
Amazon has launched a new research lab in San Francisco to develop foundational capabilities for useful AI agents. We’re enabling practical AI to make our customers more productive, empowered, and fulfilled. In particular, our work combines large language models (LLMs) with reinforcement learning (RL) to solve reasoning, planning, and world modeling in both virtual and physical environments. Our research builds on that of Amazon’s broader AGI organization, which recently introduced Amazon Nova, a new generation of state-of-the-art foundation models (FMs). Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. We’d love for you to join our lab and build it from the ground up! Key job responsibilities You will contribute directly to AI agent development in a research engineering role: running experiments, building tools to accelerate scientific workflows, and scaling up AI systems. Key responsibilities include: * Design, maintain, and enhance tools and workflows that support cutting-edge research * Adapt quickly to evolving research priorities and team needs * Stay informed on the latest advancements in large language models and related research * Collaborate closely with researchers to develop new techniques and tools around emerging agent capabilities * Drive project execution, including scoping, prioritization, timeline management, and stakeholder communication * Thrive in a fast-paced, iterative environment, delivering high-quality software on tight schedules * Apply strong software engineering fundamentals to produce clean, reliable, and maintainable code About the team The Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents that can take actions in the digital and physical worlds. In other words, we’re enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled. The lab is designed to empower AI researchers and engineers to make major breakthroughs with speed and focus toward this goal. Our philosophy combines the agility of a startup with the resources of Amazon. By keeping the team lean, we’re able to maximize the amount of compute per person. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research.
US, CA, Sunnyvale
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video subscriptions such as Apple TV+, HBO Max, Peacock, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video team member, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! Key job responsibilities As an Applied Scientist at Prime Video, you will have end-to-end ownership of the product, related research and experimentation, applying advanced machine learning techniques in computer vision (CV), Generative AI, multimedia understanding and so on. You’ll work on diverse projects that enhance Prime Video’s content localization, image/video understanding, and content personalization, driving impactful innovations for our global audience. Other responsibilities include: - Research and develop generative models for controllable synthesis across images, video, vector graphics, and multimedia - Innovate in advanced diffusion and flow-based methods (e.g., inverse flow matching, parameter efficient training, guided sampling, test-time adaptation) to improve efficiency, controllability, and scalability. - Advance visual grounding, depth and 3D estimation, segmentation, and matting for integration into pre-visualization, compositing, VFX, and post-production pipelines. - Design multimodal GenAI workflows including visual-language model tooling, structured prompt orchestration, agentic pipelines. A day in the life Prime Video is pioneering the use of Generative AI to empower the next generation of creatives. Our mission is to make world-class media creation accessible, scalable, and efficient. We are seeking an Applied Scientist to advance the state of the art in Generative AI and to deliver these innovations as production-ready systems at Amazon scale. Your work will give creators unprecedented freedom and control while driving new efficiencies across Prime Video’s global content and marketing pipelines. This is a newly formed team within Prime Video Science!
US, CA, Sunnyvale
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video subscriptions such as Apple TV+, HBO Max, Peacock, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video team member, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! Key job responsibilities As an Applied Scientist at Prime Video, you will have end-to-end ownership of the product, related research and experimentation, applying advanced machine learning techniques in computer vision (CV), Generative AI, multimedia understanding and so on. You’ll work on diverse projects that enhance Prime Video’s content localization, image/video understanding, and content personalization, driving impactful innovations for our global audience. Other responsibilities include: - Research and develop generative models for controllable synthesis across images, video, vector graphics, and multimedia - Innovate in advanced diffusion and flow-based methods (e.g., inverse flow matching, parameter efficient training, guided sampling, test-time adaptation) to improve efficiency, controllability, and scalability. - Advance visual grounding, depth and 3D estimation, segmentation, and matting for integration into pre-visualization, compositing, VFX, and post-production pipelines. - Design multimodal GenAI workflows including visual-language model tooling, structured prompt orchestration, agentic pipelines. A day in the life Prime Video is pioneering the use of Generative AI to empower the next generation of creatives. Our mission is to make world-class media creation accessible, scalable, and efficient. We are seeking an Applied Scientist to advance the state of the art in Generative AI and to deliver these innovations as production-ready systems at Amazon scale. Your work will give creators unprecedented freedom and control while driving new efficiencies across Prime Video’s global content and marketing pipelines. This is a newly formed team within Prime Video Science!
US, MA, Boston
AI is the most transformational technology of our time, capable of tackling some of humanity’s most challenging problems. That is why Amazon is investing in generative AI (GenAI) and the responsible development and deployment of large language models (LLMs) across all of our businesses. Come build the future of human-technology interaction with us. We are looking for an Applied Scientist with strong technical skills which includes coding and natural language processing experience in dataset construction, training and evaluating models, and automatic processing of large datasets. You will play a critical role in driving innovation and advancing the state-of-the-art in natural language processing and machine learning. You will work closely with cross-functional teams, including product managers, language engineers, and other scientists. Key job responsibilities Specifically, the Applied Scientist will: • Ensure quality of speech/language/other data throughout all stages of acquisition and processing, including data sourcing/collection, ground truth generation, normalization, transformation, cross-lingual alignment/mapping, etc. • Clean, analyze and select speech/language/other data to achieve goals • Build and test models that elevate the customer experience • Collaborate with colleagues from science, engineering and business backgrounds • Present proposals and results in a clear manner backed by data and coupled with actionable conclusions • Work with engineers to develop efficient data querying infrastructure for both offline and online use cases
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
US, MA, Boston
AI is the most transformational technology of our time, capable of tackling some of humanity’s most challenging problems. That is why Amazon is investing in generative AI (GenAI) and the responsible development and deployment of large language models (LLMs) across all of our businesses. Come build the future of human-technology interaction with us. We are looking for an Applied Scientist with strong technical skills which includes coding and natural language processing experience in dataset construction, training and evaluating models, and automatic processing of large datasets. You will play a critical role in driving innovation and advancing the state-of-the-art in natural language processing and machine learning. You will work closely with cross-functional teams, including product managers, language engineers, and other scientists. Key job responsibilities Specifically, the Applied Scientist will: • Ensure quality of speech/language/other data throughout all stages of acquisition and processing, including data sourcing/collection, ground truth generation, normalization, transformation, cross-lingual alignment/mapping, etc. • Clean, analyze and select speech/language/other data to achieve goals • Build and test models that elevate the customer experience • Collaborate with colleagues from science, engineering and business backgrounds • Present proposals and results in a clear manner backed by data and coupled with actionable conclusions • Work with engineers to develop efficient data querying infrastructure for both offline and online use cases
US, CA, Sunnyvale
As a Principal Scientist in the Artificial General Intelligence (AGI) organization, you are a trusted part of the technical leadership. You bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. You solicit differing views across the organization and are willing to change your mind as you learn more. Your artifacts are exemplary and often used as reference across organization. You are a hands-on scientific leader. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions. You amplify your impact by leading scientific reviews within your organization or at your location. You scrutinize and review experimental design, modeling, verification and other research procedures. You probe assumptions, illuminate pitfalls, and foster shared understanding. You align teams toward coherent strategies. You educate, keeping the scientific community up to date on advanced techniques, state of the art approaches, the latest technologies, and trends. You help managers guide the career growth of other scientists by mentoring and play a significant role in hiring and developing scientists and leads. You will play a critical role in driving the development of Generative AI (GenAI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities You will be responsible for defining key research directions, adopting or inventing new machine learning techniques, conducting rigorous experiments, publishing results, and ensuring that research is translated into practice. You will develop long-term strategies, persuade teams to adopt those strategies, propose goals and deliver on them. You will also participate in organizational planning, hiring, mentorship and leadership development. You will be technically exceptional with a passion for building scalable science and engineering solutions. You will serve as a key scientific resource in full-cycle development (conception, design, implementation, testing to documentation, delivery, and maintenance).
US, NY, New York
Do you want to leverage your expertise in translating innovative science into impactful products to improve the lives and work of over a million people worldwide? If so, People eXperience Technology Central Science (PXTCS) would love to discuss how you can make that a reality. PXTCS is an interdisciplinary team that uses economics, behavioral science, statistics, and machine learning to identify products, mechanisms, and process improvements that enhance Amazonians' well-being and their ability to deliver value for Amazon's customers. We collaborate with HR teams across Amazon to make Amazon PXT the most scientific human resources organization in the world. In this role, you will spearhead science design and technical implementation innovations across our predictive modeling and forecasting work-streams. You'll enhance existing models and create new ones, empowering leaders throughout Amazon to make data-driven business decisions. You'll collaborate with scientists and engineers to deliver solutions while working closely with business stakeholders to address their specific needs. Your work will span various business domains (corporate, operations, safety) and analysis levels (individual, group, organizational), utilizing a range of modeling approaches (linear, tree-based, deep neural networks, and LLM-based). You'll develop end-to-end ML solutions from problem formulation to deployment, maintaining high scientific standards and technical excellence throughout the process. As a Sr. Applied Scientist, you'll also contribute to the team's science strategy, keeping pace with emerging AI/ML trends. You'll mentor junior scientists, fostering their growth by identifying high-impact opportunities. Your guidance will span different analysis levels and modeling approaches, enabling stakeholders to make informed, strategic decisions. If you excel at building advanced scientific solutions and are passionate about developing technologies that drive organizational change in the AI era, join us as we work hard, have fun, and make history.
US, CA, Sunnyvale
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video subscriptions such as Apple TV+, HBO Max, Peacock, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video team member, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! Key job responsibilities As an Applied Scientist at Prime Video, you will have end-to-end ownership of the product, related research and experimentation, applying advanced machine learning techniques in computer vision (CV), Generative AI, multimedia understanding and so on. You’ll work on diverse projects that enhance Prime Video’s content localization, image/video understanding, and content personalization, driving impactful innovations for our global audience. Other responsibilities include: - Research and develop generative models for controllable synthesis across images, video, vector graphics, and multimedia - Innovate in advanced diffusion and flow-based methods (e.g., inverse flow matching, parameter efficient training, guided sampling, test-time adaptation) to improve efficiency, controllability, and scalability. - Advance visual grounding, depth and 3D estimation, segmentation, and matting for integration into pre-visualization, compositing, VFX, and post-production pipelines. - Design multimodal GenAI workflows including visual-language model tooling, structured prompt orchestration, agentic pipelines. A day in the life Prime Video is pioneering the use of Generative AI to empower the next generation of creatives. Our mission is to make world-class media creation accessible, scalable, and efficient. We are seeking an Applied Scientist to advance the state of the art in Generative AI and to deliver these innovations as production-ready systems at Amazon scale. Your work will give creators unprecedented freedom and control while driving new efficiencies across Prime Video’s global content and marketing pipelines. This is a newly formed team within Prime Video Science!
US, WA, Seattle
Are you fascinated by the power of Large Language Models (LLM) and applying Generative AI to solve complex challenges within one of Amazon's most significant businesses? Amazon Selection and Catalog Systems (ASCS) builds the systems that host and run the world's largest e-Commerce products catalog, it powers the online buying experience for customers worldwide so they can find, discover and buy anything they want. Amazon's customers rely on the completeness, consistency and correctness of Amazon's product data to make well-informed purchase decisions. We develop LLM applications that make Catalog the best-in-class source of product information for all products worldwide. This problem is challenging due to sheer scale (billions of products in the catalog), diversity (products ranging from electronics to groceries) and multitude of input sources (millions of sellers contributing product data with different quality). We are seeking a passionate, talented, and inventive individual to join the Catalog AI team and help build industry-leading technologies that customers will love. You will apply machine learning and large language model techniques, such as fine-tuning, reinforcement learning, and prompt optimization, to solve real customer problems. You will work closely with scientists and engineers to experiment with new methods, run large-scale evaluations, and bring research ideas into production. Key job responsibilities * Design and implement LLM-based solutions to improve catalog data quality and completeness * Conduct experiments and A/B tests to validate model improvements and measure business impact * Optimize large language models for quality and cost on catalog-specific tasks * Collaborate with engineering teams to deploy models at scale serving billions of products