Long-form-video understanding and synthesis

Four CVPR papers from Prime Video examine a broad set of topics related to efficient model training for understanding and synthesizing long-form cinematic content.

At this year’s Conference on Computer Vision and Pattern Recognition (CVPR), Prime Video presented four papers that indicate the broad range of cutting-edge problems we work on.

In one paper, “Movies2Scenes: Using movie metadata to learn scene representation", we present a novel contrastive-learning approach that uses only commonly available movie metadata to learn a general-purpose scene representation. On a diverse set of tasks evaluated using multiple benchmark datasets, models that use our representations consistently outperform models using existing state-of-the-art representations.

Notably, our learned representation offers an average improvement of 7.9% on the seven classification tasks and 9.7% on the two regression tasks in the Long-Form Video Understanding (LVU) dataset. This effort is an important step toward the first foundation model for general-purpose movie understanding.

In another paper, “Selective structured state-spaces for long-form video understanding”, we expand on the recently proposed S4 model that employs a lightweight mask generator to adaptively select informative image tokens, resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos. Our approach is consistently more accurate than the previous state-of-the-art model, by as much as 9.6%, while reducing the memory footprint by 23%.

Related content
Detectors for block corruption, audio artifacts, and errors in audio-video synchronization are just three of Prime Video’s quality assurance tools.

Similarly, our paper "Dynamic inference with grounding based vision and language models" explores the problem of computational redundancy in large vision-and-language models, addressing this challenge by dynamically skipping network layers, dropping input tokens, and fusing multimodal tokens, conditioned on the input image-text pair. Our results show that we can improve the run-time efficiency of the state-of-the-art models by up to 50% on multiple downstream tasks with an accuracy drop of only 0.3%.

Lastly, our paper "LEMaRT: Label-efficient masked region transform for image harmonization" addresses the problem of requiring large amounts of labeled data to train image harmonization models, which modify content from different source images so that they blend together better in composite images. To this end, our method automatically generates training data by simulating defects in appearance that image harmonization models are expected to remove. Our method outperforms previous state-of-the-art approaches by a margin of 0.4dB (mean square error improvement = ~9%) when it is fine-tuned on only 50% of the training data from one of the standard benchmarks (iHarmony4) and by 1.0 dB (MSE improvement = ~21%) when it is trained on the full training dataset.

Toward a foundation model for movie understanding

The term “foundation model” generally relates to (i) a single large model that is (ii) trained on large amounts of mostly unlabeled data and can (iii) drive a number of downstream tasks. While several general-purpose visual-and-textual foundation models exist (e.g., BERT, GPT-4, CLIP, DALL-E 2, etc.), no foundation model particularly geared for movie understanding has been proposed before our work.

This is partly because directly applying existing visual or textual foundation models for movie understanding has limited effectiveness, given the large domain gap between cinematic content and the web-crawled images and text used to train those models. Factors such as the inaccessibility of much large-scale cinematic content, the computational resources required to process it, and the lack of benchmark datasets for evaluation on downstream applications add to the challenge of building a foundation model for movie understanding.

Related content
CVPR papers examine the recovery of 3-D information from camera movement and learning general representations from weakly annotated data.

To address these challenges, we proposed a novel model trained on over five million scenes automatically identified from thousands of movies and comprising more than 45 million frames. Our model does not require any manual annotations and relies only on commonly available movie-level information (genre, synopsis, etc.). The scene representations from our model can be applied to improve the performance of a diverse set of downstream tasks, which is a key step toward building a foundation model for movie understanding.

We use movie metadata to define a measure of movie similarity and use that similarity measure to identify data pairs for contrastive learning. In contrastive learning, a model is trained on both positive pairs — examples that are similar in the relevant way — and negative pairs. During training, the model learns to produce data representations that pull positive pairs together and push negative pairs apart.

Often, the positive pairs are created by augmenting existing examples — say, re-cropping them, reversing them, or re-coloring them. By instead using movies that are considered similar to each other (see below), we ensure that our positive scene-pairs are not only visually similar but also semantically coherent, providing us with a much richer set of geometric and thematic data augmentations that enhance the training objective beyond traditional augmentation approaches.

Overview of approach.png
Overview of our approach.

As can be seen in the video below, our learned scene representation is able to effectively put thematically similar scenes close to each other.

Qualitative examples of similar-scene pairs found using our approach.

In the examples below, we compare our representation with the commonly used CLIP visual representation for scene retrieval using place-labeled scenes in the Long-Form Video Understanding (LVU) dataset. Given a query scene, our representation can capture appearance as well as semantic concepts to retrieve similar scenes more effectively, while CLIP can capture only local appearance-based patterns. For overall retrieval precision on six categories of places, our representation offers a 22.7% improvement over CLIP.

Video representation comparison.png
A comparison of our video representation method and one of its predecessors, CLIP, on the task of place retrieval using the Long-Form Video Understanding (LVU) dataset.

Quantitatively, our learned representation exhibits an average improvement of 7.9% and 9.7% on the seven classification tasks and two regression tasks of the LVU dataset, respectively. Furthermore, using our newly collected MCD dataset in Prime Video, we compare our learned scene representation with state-of-the-art models pretrained on action recognition and image classification datasets. Our scene representation outperforms the alternatives by margins ranging from 3.8% to 50.9% across different models and tasks.

Reducing model complexity for long-form-video understanding

At Prime Video, we’re developing state-of-the-art AI models for cinematic-content understanding to facilitate a variety of downstream use cases. One of the key technical problems to this end is effective modeling of complex spatiotemporal dependencies, particularly in long-form videos such as movies and TV episodes.

Spatiotemporal dependencies.png
Various shots from the movie Stuart Little, showing the complex spatiotemporal dependencies of cinematic content.

Previously proposed convolutional and recurrent neural networks struggle to learn long-term dependencies. In part this is because of exploding or vanishing gradients — where cascading adjustments to model weights grow too small or too large — as information is incorporated over long durations. Vision transformers can use self-attention to address this challenge, attending to particular, prior frames of video when interpreting the current frame. But this is computationally expensive, as it requires pairwise computations between the current frame and its predecessors.

Related content
Prime Video beats previous state of the art on the MovieNet dataset by 13% with a new model that is 90% smaller and 84% faster.

The recently proposed structured-state-space-sequence (S4) model, with its linear complexity, offers a promising direction in this space; however, we empirically demonstrate that treating all image tokens equally, as the S4 model does, can adversely affect a model’s efficiency and accuracy.

To address this challenge, we present a novel selective S4 (i.e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens, resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos. Unlike previous methods, which used mask-based token reduction in transformers, our S5 model avoids the dense self-attention calculation by following the guidance of the momentum-updated S4 model. This enables our model to efficiently discard less informative tokens and adapt to various long-form-video-understanding tasks more effectively.

S5 model.png
At left is an illustration of our S5 model (a). We introduce a “mask generator” that enacts a selective token-picking strategy, leveraging the feature representations from the momentum S4 model. The momentum S4 model is updated by the S4 model in the moving-average manner. At right is an illustration of the proposed pretraining framework using long-short masked contrastive learning (b), which initializes our S5 model to enhance robustness.

However, as is the case with most token reduction methods, the informative image tokens may be dropped incorrectly. To improve the robustness and the temporal horizon of our model, we propose a novel long-short masked contrastive-learning (LSMCL) approach that enables our model to predict longer temporal contexts using shorter input videos.

We present extensive comparative results using three challenging long-form video-understanding datasets (LVU, COIN, and Breakfast), demonstrating that our approach is consistently more accurate than the previous state-of-the-art S4 model, by as much as 9.6% on one dataset, with a memory footprint that’s 23% smaller.

Dynamic inference of multimodal models using reinforcement learning

The availability of transformer models operating over multiple data modalities as well as large-scale pretraining approaches has led to significant progress on joint image-and-language models. However, these models impose high computational costs and therefore offer low run-time efficiency, making them difficult to apply to Prime Video’s large catalogue.

Although approaches such as pruning, knowledge distillation, and quantization can help address this challenge, they can incur significant drops in accuracy (e.g., ≥ 1% at ≥ 50% model compression rates), as they are primarily designed for model-parameter reduction, not improving run-time efficiency.

Related content
The switch to WebAssembly increases stability, speed.

To address this challenge, we propose a model that saves computation by dynamically skipping layers of a multimodal network; pruning input tokens from either the language backbone, the image backbone, or both; and fusing tokens from the separate backbones, conditioned on the input image-text pair.

Most multimodal transformer models include multihead self-attention and feed-forward network layers, which can be skipped for some inputs. Additionally, we remove redundant tokens at different levels of the backbones and fuse the image tokens with the language tokens in an adaptive manner. To learn policies for dynamic inference, we train agents using reinforcement learning.

Our results demonstrate that we can improve the run-time efficiency of the state-of-the-art models MDETR and GLIP by up to 50% on the tasks of referring-expression comprehension, segmentation, and visual question-answering, with a maximum accuracy drop of only 0.3%.

Accuracy vs FPS:FLOPS.png
Accuracy-vs.-frames-per-second (a and b) and accuracy-vs.-GFLOPS (c and d) comparisons of the evaluated models. As shown, our proposed method comfortably outperforms multiple alternative approaches on both metrics while maintaining high accuracy.

Improving label efficiency of image harmonization models

Image harmonization is an important component of the broader problem of image composition, where new images are created by extracting foreground regions from one image and transferring them to another image in a photorealistic manner.

Related content
Two papers at WACV propose neural models for enhancing video-streaming experiences.

The main technical challenge for image harmonization is the appearance mismatch between the foreground extracted from the source image and the background of the destination image. Image harmonization aims to adjust the appearance of the foreground to make it compatible with the background. However, training traditional models for image harmonization requires a large amount of labeled data, which is costly and time-consuming to obtain.

To address this challenge, we introduce a novel approach to pretraining image harmonization models, LEMaRT, which automatically generates training data by simulating the types of defects that image harmonization models are expected to remove. LEMaRT takes an image as input, selects a region in that image, and applies a set of appearance transformations to it. We use these modified images, along with the original images, to pretrain our image harmonization model. Furthermore, we introduce an image harmonization model, SwinIH, by retrofitting the previously proposed Swin Transformer with a combination of local and global self-attention mechanisms.

Image transformations.png
Given an image, our approach applies a set of transformations (e.g., brightness, hue adjustment) to obtain a transformed image that is combined with the original image to form a composite. These composite images are used to pretrain our image harmonization transformer model. As shown in the figure, our model is capable of reconstructing photorealistic outputs.

Pretraining our SwinIH model with our LEMaRT approach results in a new state of the art for image harmonization, while being label-efficient, i.e., consuming less annotated data for fine-tuning than existing methods. Notably, on the iHarmony4 dataset, SwinIH outperforms the state of the art, i.e., SCS-Co by a margin of 0.4 dB when it is fine-tuned on only 50% of the training data and by 1.0 dB when it is trained on the full training dataset.

LeMART performance.png
Using our LEMaRT pretraining scheme, our image harmonization model (SwinIH) surpasses state-of-the-art (SOTA) counterparts with less than 40% of the training data from iHarmony4 for fine-tuning. Qualitatively, LEMaRT is better than competing methods at color correction, thanks to the distribution of photorealistic images that it learns from a large amount of unlabeled data during self-supervised pretraining.

Qualitative comparisons suggest that LEMaRT is better at color correction than prior methods, thanks to the pretraining process, during which LEMaRT learns the distribution of photorealistic images.

Qualitative comparison.png
Qualitative comparison between our method, LEMaRT (SwinIH), and three state-of-the-art methods (RainNet, iS2AM, DHT+) on the iHarmony4 dataset.

Research areas

Related content

US, WA, Bellevue
Conversational AI ModEling and Learning (CAMEL) team is part of Amazon Devices organization where our mission is to build a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist, you will leverage your technical expertise and experience to collaborate with other talented applied scientists and engineers to research and develop novel algorithms and modeling techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including Prompt Engineering and Optimization, Supervised Fine-Tuning, Learning from Human Feedback, Evaluation, Self-Learning, etc. Your work will directly impact our customers in the form of novel products and services.
IL, Tel Aviv
Are you an inventive, curious, and driven Applied Scientist with a strong background in AI and Deep Learning? Join Amazon’s AWS Multimodal generative AI science team and be a catalyst for groundbreaking advancements in Computer Vision, Generative AI, and foundational models. As part of the AWS Multimodal generative AI science team, you’ll lead innovative research projects, develop state-of-the-art algorithms, and pioneer solutions that will directly impact millions of Amazon customers. Leveraging Amazon’s vast computing power, you’ll work alongside a supportive and diverse group of top-tier scientists and engineers, contributing to products that redefine the industry. Key job responsibilities * Lead research initiatives in Multimodal generative AI, pushing the boundaries of model efficiency, accuracy, and scalability. * Design, implement, and evaluate deep learning models in a production environment. * Collaborate with cross-functional teams to transfer research outcomes into scalable AWS services. * Publish in top-tier conferences and journals, keeping Amazon at the forefront of innovation. * Mentor and guide other scientists and engineers, fostering a culture of scientific curiosity and excellence.
GB, Cambridge
The Artificial General Intelligence team (AGI) has an exciting position for an Applied Scientist with a strong background NLP and Large Language Models to help us develop state-of-the-art conversational systems. As part of this team, you will collaborate with talented scientists and software engineers to enable conversational assistants capabilities to support the use of external tools and sources of information, and develop novel reasoning capabilities to revolutionise the user experience for millions of Alexa customers. Key job responsibilities As an Applied Scientist, you will develop innovative solutions to complex problems to extend the functionalities of conversational assistants . You will use your technical expertise to research and implement novel algorithms and modelling solutions in collaboration with other scientists and engineers. You will analyse customer behaviours and define metrics to enable the identification of actionable insights and measure improvements in customer experience. You will communicate results and insights to both technical and non-technical audiences through written reports, presentations and external publications.
US, WA, Bellevue
Conversational AI ModEling and Learning (CAMEL) team is part of Amazon Artificial General Intelligence (AGI) organization where our mission is to create a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As an Applied Scientist, you will leverage your technical expertise and experience to collaborate with other talented applied scientists and engineers to research and develop novel algorithms and modeling techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including Supervised Fine-Tuning (SFT), In-Context Learning (ICL), Learning from Human Feedback (LHF), etc. Your work will directly impact our customers in the form of novel products and services.
US, WA, Seattle
We are seeking a highly skilled economist to measure and understand how each Customer Service activity impacts customers. This candidate's analysis will assist teams across Amazon to prioritize defect elimination efforts and optimize how we respond to customer contacts. This candidate will partner closely with our product, program, and tech teams to deliver their findings to users via systems and dashboards that guide Customer Service planning and policy rules. Key job responsibilities - Develop Causal, Economic, and Machine Learning models at scale. - Engage in economic analysis and raise the bar for research. - Inform strategic discussions with senior leaders across the company to guide policies. A day in the life If you are not sure that every qualification on the list above describes you exactly, we'd still love to hear from you! At Amazon, we value people with unique backgrounds, experiences, and skillsets. If you’re passionate about this role and want to make an impact on a global scale, please apply! Amazon offers a full range of benefits that support you and eligible family members, including domestic partners and their children. Benefits can vary by location, the number of regularly scheduled hours you work, length of employment, and job status such as seasonal or temporary employment. The benefits that generally apply to regular, full-time employees include: 1. Medical, Dental, and Vision Coverage 2. Maternity and Parental Leave Options 3. Paid Time Off (PTO) 4. 401(k) Plan About the team The Worldwide defect elimination team's mission is to understand and resolve all issues impacting customers at scale. The Customer Service Economics and Optimization team is a force multiplier within this group, helping to understand the impact of these issues and our actions to optimize the customer experience.
US, WA, Seattle
We are building GenAI based shopping assistant for Amazon. We reimage Amazon Search with an interactive conversational experience that helps you find answers to product questions, perform product comparisons, receive personalized product suggestions, and so much more, to easily find the perfect product for your needs. We’re looking for the best and brightest across Amazon to help us realize and deliver this vision to our customers right away. This will be a once in a generation transformation for Search, just like the Mosaic browser made the Internet easier to engage with three decades ago. If you missed the 90s—WWW, Mosaic, and the founding of Amazon and Google—you don’t want to miss this opportunity.
US, WA, Seattle
At Amazon, we believe that scientific innovation is essential to being the most customer-centric company in the world. Our scientists' ability to have an impact at scale allows us to attract some of the brightest minds in machine learning, artificial intelligence and related fields. Amazon scientists employ the company's working backwards method to identify problems to solve on behalf of customers in research areas ranging from machine learning to operations, GenAI, robotics, quantum computing, computer vision, economics, search, sustainability and more. Learn more about Amazon Science here: https://www.amazon.science/ We are hiring across multiple businesses and in many locations across the US. Apply here to learn more about open roles that could be a compelling fit for your background. Key job responsibilities You will be responsible for defining key research directions, adopting or inventing new machine learning techniques, conducting rigorous experiments, publishing results, and ensuring that research is translated into practice. You will develop long-term strategies, persuade teams to adopt those strategies, propose goals and deliver on them. You will also participate in organizational planning, hiring, mentorship and leadership development. You will be technically fearless and with a passion for building scalable science and engineering solutions. You will serve as a key scientific resource in full-cycle development (conception, design, implementation, testing to documentation, delivery, and maintenance).
NL, Amsterdam
Are you a MS or PhD student interested in a 2025 Internship in the field of machine learning, deep learning, speech, robotics, computer vision, optimization, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact, visionary person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Luxembourg, Netherlands, Poland, Romania, Spain, UAE, and UK). Please note these are not remote internships.
US, WA, Seattle
Come be a part of a rapidly expanding $35 billion-dollar global business. At Amazon Business, a fast-growing startup passionate about building solutions, we set out every day to innovate and disrupt the status quo. We stand at the intersection of tech & retail in the B2B space developing innovative purchasing and procurement solutions to help businesses and organizations thrive. At Amazon Business, we strive to be the most recognized and preferred strategic partner for smart business buying. Bring your insight, imagination and a healthy disregard for the impossible. Join us in building and celebrating the value of Amazon Business to buyers and sellers of all sizes and industries. Unlock your career potential. The AB Sales Analytics, Data, Product and Tech (ADAPTech) team uses CRM, data, product, and science to improve Sales productivity and performance. It has four pillars: 1) SalesTech maintains Salesforce to enable Sales workflows, and supports >2K users in nine countries; 2) Product and Science builds tools embedded with bespoke Machine Learning (ML) and GenAI large language models to enable sales reps to prioritize top accounts, position the right Amazon Business (AB) product features, and take actions based on critical customer events; 3) Sales Data Management (SDM) and Sales Account Management (SAM) enrich customer profiles and business hierarchies while improving productivity through automation and integration of internal/external tools; and 4) Business Intelligence (BI) enables self-service reporting simplifying access to key insights through WBRs and dashboards. Sales teams leverage these products to identify which customers to target, what features to target them with, and when to target them, in order to capture their share of wallet. A successful Applied Scientist at Amazon demonstrates bias for action and operates in a startup environment, with outstanding leadership skills, and proven ability to build and manage medium-scale modeling projects, identify data requirements, build methodology and tools that are statistically grounded. We need great leaders to think big and design new solutions to solve complex problems using machine learning (ML) and Generative AI techniques to improve our customers’ experience when using AB. You have hands-on experience making the right decisions about technology, models and methodology choices. Key job responsibilities As an Applied Scientist, you will primarily leverage machine learning techniques and generative AI to outreach customers based on their life cycle stage, behavioral patterns, and purchase history. You may also perform text mining and insight analysis of real-time customer conversations and make the model learn and recommend the solutions. Your work will directly impact the trust customers place in Amazon Business. You will partner with product management and technical leadership to identify opportunities to innovate customer journey experiences. You will identify new areas of investment and work to align product roadmaps to deliver on these opportunities. As a science leader, you will not only develop unique scientific solutions, but also play a crucial role in shaping strategies. Additional responsibilities include: -Design, implement, test, deploy and maintain innovative data and machine learning solutions to further the customer experience. -Create experiments and prototype implementations of new learning algorithms and prediction techniques -Develop algorithms for new capabilities and trace decisions in the data and assess how proposed changes could potentially impact business metrics to cater needs of Amazon Business Sales -Build models that measure incremental value, predict growth, define and conduct experiments to optimize engagement of AB customers, and communicate insights and recommendations to product, sales, and finance partners. A day in the life In this role, you will be a technical expert with significant scope and impact. You will work with Technical Product Managers, Data Engineers, other Scientists, and Salesforce developers, to build new and enhance existing ML models to optimize customer experience. You will prototype and test new ideas, iterate quickly, and deploy models to production. Also, you will conduct in-depth data analysis and feature engineering to build robust ML models.
US, WA, Seattle
Amazon continues to invest heavily in building our world class advertising business. Our products are strategically important to our Retail and Marketplace businesses, driving long term growth. We deliver billions of ad impressions and millions of clicks daily, breaking fresh ground to create world-class products. We are highly motivated, collaborative and fun-loving with an entrepreneurial spirit and strong bias for action. With a broad mandate to experiment and innovate, we are growing at an unprecedented rate with a seemingly endless range of new opportunities. The Sponsored Products Monetization team is broadly responsible for pricing of ads on Amazon search pages, balancing short-term and long-term ad revenue growth to drive sustainable marketplace health. As a Senior Applied Scientist on our team, you will be responsible for defining the science and technical strategy for one of our most impactful marketplace controls, creating lasting value for Amazon and our advertising customers. You will help to identify unique opportunities to create customized and delightful shopping experience for our growing marketplaces worldwide. Your job will be identify big opportunities for the team that can help to grow Sponsored Products business working with retail partner teams, Product managers, Software engineers and PMs. You will have opportunity to design, run and analyze A/B experiments to improve the experience of millions of Amazon shoppers while driving quantifiable revenue impact. More importantly, you will have the opportunity to broaden your technical skills in an environment that thrives on creativity, experimentation, and product innovation. Key job responsibilities - Lead science, tech and business strategy and roadmap for Sponsored Products Monetization - Drive alignment across multiple organizations for science, engineering and product strategy to achieve business goals - Lead and mentor scientists and engineers across teams to develop, test, launch and improve of science models designed to optimize the shopper experience and deliver long term value for Amazon and advertisers - Develop state of the art experimental approaches and ML models - Drive end-to-end Machine Learning projects that have a high degree of ambiguity, scale, complexity - Establish scalable, efficient, automated processes for large-scale data analysis, machine-learning model development, model validation and serving - Research new and innovative machine learning approaches - Recruit Scientists to the team and provide mentorship