Long-form-video understanding and synthesis

Four CVPR papers from Prime Video examine a broad set of topics related to efficient model training for understanding and synthesizing long-form cinematic content.

At this year’s Conference on Computer Vision and Pattern Recognition (CVPR), Prime Video presented four papers that indicate the broad range of cutting-edge problems we work on.

In one paper, “Movies2Scenes: Using movie metadata to learn scene representation", we present a novel contrastive-learning approach that uses only commonly available movie metadata to learn a general-purpose scene representation. On a diverse set of tasks evaluated using multiple benchmark datasets, models that use our representations consistently outperform models using existing state-of-the-art representations.

Notably, our learned representation offers an average improvement of 7.9% on the seven classification tasks and 9.7% on the two regression tasks in the Long-Form Video Understanding (LVU) dataset. This effort is an important step toward the first foundation model for general-purpose movie understanding.

In another paper, “Selective structured state-spaces for long-form video understanding”, we expand on the recently proposed S4 model that employs a lightweight mask generator to adaptively select informative image tokens, resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos. Our approach is consistently more accurate than the previous state-of-the-art model, by as much as 9.6%, while reducing the memory footprint by 23%.

Related content
Detectors for block corruption, audio artifacts, and errors in audio-video synchronization are just three of Prime Video’s quality assurance tools.

Similarly, our paper "Dynamic inference with grounding based vision and language models" explores the problem of computational redundancy in large vision-and-language models, addressing this challenge by dynamically skipping network layers, dropping input tokens, and fusing multimodal tokens, conditioned on the input image-text pair. Our results show that we can improve the run-time efficiency of the state-of-the-art models by up to 50% on multiple downstream tasks with an accuracy drop of only 0.3%.

Lastly, our paper "LEMaRT: Label-efficient masked region transform for image harmonization" addresses the problem of requiring large amounts of labeled data to train image harmonization models, which modify content from different source images so that they blend together better in composite images. To this end, our method automatically generates training data by simulating defects in appearance that image harmonization models are expected to remove. Our method outperforms previous state-of-the-art approaches by a margin of 0.4dB (mean square error improvement = ~9%) when it is fine-tuned on only 50% of the training data from one of the standard benchmarks (iHarmony4) and by 1.0 dB (MSE improvement = ~21%) when it is trained on the full training dataset.

Toward a foundation model for movie understanding

The term “foundation model” generally relates to (i) a single large model that is (ii) trained on large amounts of mostly unlabeled data and can (iii) drive a number of downstream tasks. While several general-purpose visual-and-textual foundation models exist (e.g., BERT, GPT-4, CLIP, DALL-E 2, etc.), no foundation model particularly geared for movie understanding has been proposed before our work.

This is partly because directly applying existing visual or textual foundation models for movie understanding has limited effectiveness, given the large domain gap between cinematic content and the web-crawled images and text used to train those models. Factors such as the inaccessibility of much large-scale cinematic content, the computational resources required to process it, and the lack of benchmark datasets for evaluation on downstream applications add to the challenge of building a foundation model for movie understanding.

Related content
CVPR papers examine the recovery of 3-D information from camera movement and learning general representations from weakly annotated data.

To address these challenges, we proposed a novel model trained on over five million scenes automatically identified from thousands of movies and comprising more than 45 million frames. Our model does not require any manual annotations and relies only on commonly available movie-level information (genre, synopsis, etc.). The scene representations from our model can be applied to improve the performance of a diverse set of downstream tasks, which is a key step toward building a foundation model for movie understanding.

We use movie metadata to define a measure of movie similarity and use that similarity measure to identify data pairs for contrastive learning. In contrastive learning, a model is trained on both positive pairs — examples that are similar in the relevant way — and negative pairs. During training, the model learns to produce data representations that pull positive pairs together and push negative pairs apart.

Often, the positive pairs are created by augmenting existing examples — say, re-cropping them, reversing them, or re-coloring them. By instead using movies that are considered similar to each other (see below), we ensure that our positive scene-pairs are not only visually similar but also semantically coherent, providing us with a much richer set of geometric and thematic data augmentations that enhance the training objective beyond traditional augmentation approaches.

Overview of approach.png
Overview of our approach.

As can be seen in the video below, our learned scene representation is able to effectively put thematically similar scenes close to each other.

Qualitative examples of similar-scene pairs found using our approach.

In the examples below, we compare our representation with the commonly used CLIP visual representation for scene retrieval using place-labeled scenes in the Long-Form Video Understanding (LVU) dataset. Given a query scene, our representation can capture appearance as well as semantic concepts to retrieve similar scenes more effectively, while CLIP can capture only local appearance-based patterns. For overall retrieval precision on six categories of places, our representation offers a 22.7% improvement over CLIP.

Video representation comparison.png
A comparison of our video representation method and one of its predecessors, CLIP, on the task of place retrieval using the Long-Form Video Understanding (LVU) dataset.

Quantitatively, our learned representation exhibits an average improvement of 7.9% and 9.7% on the seven classification tasks and two regression tasks of the LVU dataset, respectively. Furthermore, using our newly collected MCD dataset in Prime Video, we compare our learned scene representation with state-of-the-art models pretrained on action recognition and image classification datasets. Our scene representation outperforms the alternatives by margins ranging from 3.8% to 50.9% across different models and tasks.

Reducing model complexity for long-form-video understanding

At Prime Video, we’re developing state-of-the-art AI models for cinematic-content understanding to facilitate a variety of downstream use cases. One of the key technical problems to this end is effective modeling of complex spatiotemporal dependencies, particularly in long-form videos such as movies and TV episodes.

Spatiotemporal dependencies.png
Various shots from the movie Stuart Little, showing the complex spatiotemporal dependencies of cinematic content.

Previously proposed convolutional and recurrent neural networks struggle to learn long-term dependencies. In part this is because of exploding or vanishing gradients — where cascading adjustments to model weights grow too small or too large — as information is incorporated over long durations. Vision transformers can use self-attention to address this challenge, attending to particular, prior frames of video when interpreting the current frame. But this is computationally expensive, as it requires pairwise computations between the current frame and its predecessors.

Related content
Prime Video beats previous state of the art on the MovieNet dataset by 13% with a new model that is 90% smaller and 84% faster.

The recently proposed structured-state-space-sequence (S4) model, with its linear complexity, offers a promising direction in this space; however, we empirically demonstrate that treating all image tokens equally, as the S4 model does, can adversely affect a model’s efficiency and accuracy.

To address this challenge, we present a novel selective S4 (i.e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens, resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos. Unlike previous methods, which used mask-based token reduction in transformers, our S5 model avoids the dense self-attention calculation by following the guidance of the momentum-updated S4 model. This enables our model to efficiently discard less informative tokens and adapt to various long-form-video-understanding tasks more effectively.

S5 model.png
At left is an illustration of our S5 model (a). We introduce a “mask generator” that enacts a selective token-picking strategy, leveraging the feature representations from the momentum S4 model. The momentum S4 model is updated by the S4 model in the moving-average manner. At right is an illustration of the proposed pretraining framework using long-short masked contrastive learning (b), which initializes our S5 model to enhance robustness.

However, as is the case with most token reduction methods, the informative image tokens may be dropped incorrectly. To improve the robustness and the temporal horizon of our model, we propose a novel long-short masked contrastive-learning (LSMCL) approach that enables our model to predict longer temporal contexts using shorter input videos.

We present extensive comparative results using three challenging long-form video-understanding datasets (LVU, COIN, and Breakfast), demonstrating that our approach is consistently more accurate than the previous state-of-the-art S4 model, by as much as 9.6% on one dataset, with a memory footprint that’s 23% smaller.

Dynamic inference of multimodal models using reinforcement learning

The availability of transformer models operating over multiple data modalities as well as large-scale pretraining approaches has led to significant progress on joint image-and-language models. However, these models impose high computational costs and therefore offer low run-time efficiency, making them difficult to apply to Prime Video’s large catalogue.

Although approaches such as pruning, knowledge distillation, and quantization can help address this challenge, they can incur significant drops in accuracy (e.g., ≥ 1% at ≥ 50% model compression rates), as they are primarily designed for model-parameter reduction, not improving run-time efficiency.

Related content
The switch to WebAssembly increases stability, speed.

To address this challenge, we propose a model that saves computation by dynamically skipping layers of a multimodal network; pruning input tokens from either the language backbone, the image backbone, or both; and fusing tokens from the separate backbones, conditioned on the input image-text pair.

Most multimodal transformer models include multihead self-attention and feed-forward network layers, which can be skipped for some inputs. Additionally, we remove redundant tokens at different levels of the backbones and fuse the image tokens with the language tokens in an adaptive manner. To learn policies for dynamic inference, we train agents using reinforcement learning.

Our results demonstrate that we can improve the run-time efficiency of the state-of-the-art models MDETR and GLIP by up to 50% on the tasks of referring-expression comprehension, segmentation, and visual question-answering, with a maximum accuracy drop of only 0.3%.

Accuracy vs FPS:FLOPS.png
Accuracy-vs.-frames-per-second (a and b) and accuracy-vs.-GFLOPS (c and d) comparisons of the evaluated models. As shown, our proposed method comfortably outperforms multiple alternative approaches on both metrics while maintaining high accuracy.

Improving label efficiency of image harmonization models

Image harmonization is an important component of the broader problem of image composition, where new images are created by extracting foreground regions from one image and transferring them to another image in a photorealistic manner.

Related content
Two papers at WACV propose neural models for enhancing video-streaming experiences.

The main technical challenge for image harmonization is the appearance mismatch between the foreground extracted from the source image and the background of the destination image. Image harmonization aims to adjust the appearance of the foreground to make it compatible with the background. However, training traditional models for image harmonization requires a large amount of labeled data, which is costly and time-consuming to obtain.

To address this challenge, we introduce a novel approach to pretraining image harmonization models, LEMaRT, which automatically generates training data by simulating the types of defects that image harmonization models are expected to remove. LEMaRT takes an image as input, selects a region in that image, and applies a set of appearance transformations to it. We use these modified images, along with the original images, to pretrain our image harmonization model. Furthermore, we introduce an image harmonization model, SwinIH, by retrofitting the previously proposed Swin Transformer with a combination of local and global self-attention mechanisms.

Image transformations.png
Given an image, our approach applies a set of transformations (e.g., brightness, hue adjustment) to obtain a transformed image that is combined with the original image to form a composite. These composite images are used to pretrain our image harmonization transformer model. As shown in the figure, our model is capable of reconstructing photorealistic outputs.

Pretraining our SwinIH model with our LEMaRT approach results in a new state of the art for image harmonization, while being label-efficient, i.e., consuming less annotated data for fine-tuning than existing methods. Notably, on the iHarmony4 dataset, SwinIH outperforms the state of the art, i.e., SCS-Co by a margin of 0.4 dB when it is fine-tuned on only 50% of the training data and by 1.0 dB when it is trained on the full training dataset.

LeMART performance.png
Using our LEMaRT pretraining scheme, our image harmonization model (SwinIH) surpasses state-of-the-art (SOTA) counterparts with less than 40% of the training data from iHarmony4 for fine-tuning. Qualitatively, LEMaRT is better than competing methods at color correction, thanks to the distribution of photorealistic images that it learns from a large amount of unlabeled data during self-supervised pretraining.

Qualitative comparisons suggest that LEMaRT is better at color correction than prior methods, thanks to the pretraining process, during which LEMaRT learns the distribution of photorealistic images.

Qualitative comparison.png
Qualitative comparison between our method, LEMaRT (SwinIH), and three state-of-the-art methods (RainNet, iS2AM, DHT+) on the iHarmony4 dataset.

Research areas

Related content

US, MA, Boston
As part of Alexa CAS team, our mission is to provide scalable and reliable evaluation of the state-of-the-art Conversational AI. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), to invent and build end-to-end evaluation of how customers perceive state-of-the-art context-aware conversational AI assistants. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, including Supervised Fine-Tuning (SFT), In-Context Learning (ICL), Learning from Human Feedback (LHF), etc. As an Applied Scientist, you will leverage your technical expertise and experience to collaborate with other talented applied scientists and engineers to research and develop novel methods for evaluating conversational assistants. You will analyze and understand user experiences by leveraging Amazon’s heterogeneous data sources and build evaluation models using machine learning methods. Key job responsibilities - Design, build, test and release predictive ML models using LLMs - Ensure data quality throughout all stages of acquisition and processing, including such areas as data sourcing/collection, ground truth generation, normalization, and transformation. - Collaborate with colleagues from science, engineering and business backgrounds. - Present proposals and results to partner teams in a clear manner backed by data and coupled with actionable conclusions - Work with engineers to develop efficient data querying and inference infrastructure for both offline and online use cases About the team Central Analytics and Research Science (CARS) is an analytics, software, and science team within Amazon's Conversational Assistant Services (CAS) organization. Our mission is to provide an end-to-end understanding of how customers perceive the assistants they interact with – from the metrics themselves to software applications to deep dive on those metrics – allowing assistant developers to improve their services. Learn more about Amazon’s approach to customer-obsessed science on the Amazon Science website, which features the latest news and research from scientists across the company. For the latest updates, subscribe to the monthly newsletter, and follow the @AmazonScience handle and #AmazonScience hashtag on LinkedIn, Twitter, Facebook, Instagram, and YouTube.
US, WA, Seattle
AWS Industry Products (IP) is a new AWS engineering organization chartered to build new AWS products by applying Amazon’s innovation mechanisms along with AWS digital technologies to transform the world, industry by industry. We dive deep with leaders and innovators to solve the problems which block their industries, enabling them to capitalize on new digital business models. Simply put, our goal is to use the skill and scale of AWS to make the benefits of a connected world achievable for all businesses. We are looking for an Applied Scientist who are passionate about transforming industries through AI. This is a unique opportunity to not only listen to industry customers but also to develop AI and generative AI expertise in multiple core industries. You will join a team of scientists, product managers and software engineers that builds AI solutions in automotive, manufacturing, healthcare, sustainability/clean energy, and supply chain/operations domains. Leveraging and advancing generative AI technology will be a big part of your charter as we seek to apply the latest advancements in generative AI to industry-specific problems. Key job responsibilities Using your in-depth expertise in machine learning and generative AI, you will deliver reusable science components and services that differentiate our industry products and solve customer problems. You will be the voice of scientific rigor, delivery, and innovation as you work with our segment teams on AI-driven product differentiators. You will conduct and advance research in AI and generative AI within and outside Amazon.
DE, Berlin
The Community Feedback organization powers customer-generated features and insights that help customers use the wisdom of the community to make unregretted shopping decisions. Today our features include Customer Reviews, Content Moderation, and Customer Q&A (Ask), however our mission and charter are broader than these features. We are focused on building a rewarding and engaging experience for contributors to share their feedback, and providing shoppers with trusted insights based on this feedback to inform their shopping decision The Community Data & Science team is looking for a passionate, talented, and inventive Senior Applied Scientist with a background in AI, Gen AI, Machine Learning, and NLP to help build LLM solutions for Community Feedback. You'll be working with talented scientists and engineers to innovate on behalf of our customers. If you're fired up about being part of a dynamic, driven team and are ready to make a lasting impact on the future of AI-powered shopping, we invite you to join us on this exciting journey to reshape shopping. Please visit https://www.amazon.science for more information. Key job responsibilities - As a Senior Applied Scientist, you will work on state-of-the-art technologies that will result in published papers. - However, you will not only theorize about the algorithms but also have the opportunity to implement them and see how they perform in the field. - Our team works on a variety of projects, including state-of-the-art generative AI, LLM fine-tuning, alignment, prompt engineering, and benchmarking solutions. - You will be also mentoring junior scientists on the team. About the team The Community Data & Science team focusses on analyzing, understanding, structuring and presenting customer-generated content (in the form of ratings, text, images and videos) to help customers use the wisdom of the community to make unregretted purchase decisions. We build and own ML models that help with i) shaping the community content corpus both in terms of quantity and quality, ii) extracting insights from the content and iii) presenting the content and insights to shoppers to eventually influence purchase decisions. Today, our ML models support experiences like content solicitation, submission, moderation, ranking, and summarization.
US, WA, Seattle
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses. As a core product offering within our advertising portfolio, Sponsored Products (SP) helps merchants, retail vendors, and brand owners succeed via native advertising, which grows incremental sales of their products sold through Amazon. The SP team's primary goals are to help shoppers discover new products they love, be the most efficient way for advertisers to meet their business objectives, and build a sustainable business that continuously innovates on behalf of customers. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! Within Sponsored Products, the Bidding team is responsible for defining and delivering a collection of advertising products around bid controls (dynamic bidding, bid recommendations, etc.) that drive discovery and sales. Our solutions generate billions in revenue and drive long-term growth for Amazon’s Retail and Marketplace businesses. We deliver billions of ad impressions, millions of clicks daily, and break fresh ground to create world-class products. We are highly motivated, collaborative, and fun-loving team with an entrepreneurial spirit - with a broad mandate to experiment and innovate. You will invent new experiences and influence customer-facing shopping experiences to help suppliers grow their retail business and the auction dynamics that leverage native advertising; this is your opportunity to work within the fastest-growing businesses across all of Amazon! Define a long-term science vision for our advertising business, driven fundamentally from our customers' needs, translating that direction into specific plans for research and applied scientists, as well as engineering and product teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding.
US, WA, Seattle
Ever wonder how you can keep the world’s largest selection also the world’s safest and legally compliant selection? Then come join a team with the charter to monitor and classify the billions of items in the Amazon catalog to ensure compliance with various legal regulations. The Classification and Policy Platform (CPP) team is looking for Applied Scientists to build technology to automatically monitor the billions of products on the Amazon platform. The software and processes built by this team are a critical component of building a catalog that our customers trust. As an Applied Scientist on the CPP team, you will train LLMs to solve customer problems, distill knowledge into optimized inference artifacts, and collaborate cross-functionally to deliver impactful solutions. This role offers the opportunity to push the boundaries of LLM capabilities and drive tangible value for our customers. The ideal candidate should possess exceptional technical skills, a startup-driven mindset, outstanding communication abilities to join our dynamic team. We believe that innovation is key to being the most customer-centric company. We innovate, publish, teach, and set strategy, while using Amazon's "working backwards" method to serve our customers.
US, MA, North Reading
Are you inspired by invention? Is problem solving through teamwork in your DNA? Do you like the idea of seeing how your work impacts the bigger picture? Answer yes to any of these and you’ll fit right in here at Amazon Robotics. We are a smart team of doers who work passionately to apply cutting edge advances in robotics and software to solve real-world challenges that will transform our customers’ experiences. We invent new improvements every day. We are Amazon Robotics and we will give you the tools and support you need to invent with us in ways that are rewarding, fulfilling, and fun. Amazon Robotics is seeking students to join us for a 5-6 month internship (full-time, 40 hours per week) as Data Science Co-op. Please note that by applying to this role you would be considered for Data Scientist spring co-op and fall co-op roles on various Amazon Robotics teams. The internship/co-op project(s) and location are determined by the team the student will be working on. Learn more about Amazon Robotics: https://amazon.jobs/en/teams/amazon-robotics About the team Amazon empowers a smarter, faster, more consistent customer experience through automation. Amazon Robotics automates fulfillment center operations using various methods of robotic technology including autonomous mobile robots, sophisticated control software, language perception, power management, computer vision, depth sensing, machine learning, object recognition, and semantic understanding of commands. Amazon Robotics has a dedicated focus on research and development to continuously explore new opportunities to extend its product lines into new areas.
US, CA, Santa Clara
Come join the AWS AI science team in building the next generation models for intelligent automation. AWS, the world-leading provider of cloud services, has fostered the creation and growth of countless new businesses, and is a positive force for good. Our customers bring problems that will give Applied Scientists like you endless opportunities to see your research have a positive and immediate impact in the world. You will have the opportunity to partner with technology and business teams to solve real-world problems, have access to virtually endless data and computational resources, and to world-class engineers and developers that can help bring your ideas into the world. As part of the team, we expect that you will develop innovative solutions to hard problems, and publish your findings at peer reviewed conferences and workshops. We are looking for world class researchers with experience in one or more of the following areas - autonomous agents, API orchestration, Planning, large multimodal models (especially vision-language models), reinforcement learning (RL) and sequential decision making. We are located in the USA (Seattle, Pasadena, Bay Area). About the team Why AWS Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Utility Computing (UC) AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (IoT), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Mentorship and Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Diverse Experiences Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying.
US, NY, New York
Want to work on one of the highest priorities across Amazon Ads? This is your chance to help build a billion dollar business, innovate on a new product space, and have a positive impact on millions of views while working with industry-leading technologies. The Ad Catalyst team in Amazon Advertising operates at the intersection of eCommerce and advertising, offering a rich array of digital advertising solutions to over a million advertisers with the goal of helping our our hundreds of millions customers find and discover anything they want to buy. We start with the customer and work backwards in everything we do, including advertising. Our team owns researching, evaluating, ranking and serving personalized recommendation to each of our 1+ million advertisers using state of the art machine learning techniques ( e.g., deep learning, deep-reinforcement learning, causal modeling). Our team is placed centrally in the Advertising Experience organization which owns the advertising console, this provides us full-stack ownership giving scientists the satisfaction of seeing their work directly power advertiser experiences with measurable outcomes. If you’re interested in joining a rapidly growing team working to build a unique, highly respected advertising group with a relentless focus on the customer, you’ve come to the right place. This is a unique opportunity to get in early and drive significant portions of the technical roadmap and shape the research agenda of a billion+ dollar business. Successful candidates will have strong technical ability, focus on customers by applying a customer-first approach, excellent teamwork and communication skills, and a motivation to achieve results in a fast-paced environment through both strong personal delivery and the ability to develop partnerships with science teams across the org. This is a high visibility leadership position where you will be the first principal scientist in a 400+ people org. Our position offers exceptional opportunities for every candidate to grow their technical and non-technical skills. If you are selected, you have the opportunity to make a difference to our business by designing and building state of the art machine learning systems on big data, leveraging Amazon’s vast computing resources (AWS), working on exciting and challenging projects, and delivering meaningful results to customers world-wide. Key job responsibilities - Be a thought leader and forward thinker, anticipating obstacles to success, helping avoid common failure modes, and holding us to a high standard of technical rigor and excellence in machine learning (ML). - Own and drive the most complex and strategic solutions across the business; responsible for many millions in revenue. - Own the dialogue with partner science teams - shape consensus in scientific research roadmap, modeling approaches evaluation and presentation of the science driven results to our advertisers. - Define evaluation methods and metrics that measure the effectiveness of advertising recommendations using a variety of science techniques (Randomized Control Trials, Causal Modeling, Reinforcement learning policy evaluation) - Research, build, and deploy innovative ML solutions; working across all technical disciplines. - Identify untapped, high-risk technical and scientific directions, and stimulate new research directions that you will deliver on. - Be responsible for communicating our ML innovations to the broader internal & external scientific communities. - Hire, mentor, and guide senior scientists. - Partner with engineering leaders to build efficient and scalable solutions. We are open to hiring candidates to work out of one of the following locations: New York, Seattle
US, CA, Santa Clara
AWS AI is looking for passionate, talented, and inventive Research Scientists with a strong machine learning background to help build industry-leading Conversational AI Systems. Our mission is to provide a delightful experience to Amazon’s customers by pushing the envelope in Natural Language Understanding (NLU), Dialog Systems including Generative AI with Large Language Models (LLMs) and Applied Machine Learning (ML). As part of our AI team in Amazon AWS, you will work alongside internationally recognized experts to develop novel algorithms and modeling techniques to advance the state-of-the-art in human language technology. Your work will directly impact millions of our customers in the form of products and services that make use language technology. You will gain hands on experience with Amazon’s heterogeneous text, structured data sources, and large-scale computing resources to accelerate advances in language understanding. We are hiring in all areas of human language technology: NLU, Dialog Management, Conversational AI, LLMs and Generative AI. About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Utility Computing (UC) AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (IoT), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
US, NY, New York
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses, responsible for defining and delivering a collection of advertising products that drive discovery and sales. Our products and solutions are strategically important to enable our Retail and Marketplace businesses to drive long-term growth. We deliver billions of ad impressions and millions of clicks and break fresh ground in product and technical innovations every day! We are seeking a highly accomplished and visionary Data Science professional to join our team, leading our data science strategy for the Media Planning Science program. In this role, you will collaborate closely with business leaders, stakeholders, and cross-functional teams to drive the success of the program through data-driven solutions. You will be responsible for shaping the data science roadmap fostering a culture of data-driven decision-making, and delivering significant business impact through advanced analytics and cutting-edge data science methodologies. Key job responsibilities As a Data Scientist on this team, you will: 1. Develop and drive the data science strategy for the Media Planning Science program, aligning it with the program's objectives and overall business goals. 2. Identify high-impact opportunities within the program and lead the ideation, planning, and execution of data science initiatives to address them. 3. Solve real-world problems by getting and analyzing large amounts of data, diving deep to identify business insights and opportunities, design simulations and experiments, developing statistical and ML models by tailoring to business needs, and collaborating with Scientists, Engineers, BIE's, and Product Managers. 4. Write code (Python, R, Scala, SQL, etc.) to obtain, manipulate, and analyze data 5. Apply statistical and machine learning knowledge to specific business problems and data. 6. Build decision-making models and propose solution for the business problem you define. 7. Formalize assumptions about how our systems are expected to work, create statistical definition of the outlier, and develop methods to systematically identify outliers. Work out why such examples are outliers and define if any actions needed. 8. Conduct written and verbal presentations to share insights to audiences of varying levels of technical sophistication. About the team The Media Planning Science team builds and deploys models that provide insights and recommendations for media planning. Our mission is to assist advertisers in activating plans that align with their goals. Our insights and recommendations leverage heuristic and machine learning models to simplify the complex tasks of forecasting, outcome prediction, budget planning, optimized audience selection and measurements for media planners. We integrate our insights into user interfaces and programmatic integrations via APIs, ensuring reliable data, timely delivery, and optimal advertising outcomes for our advertisers.