How Prime Video distills time series anomalies into actionable alarms

Targeted handling of three distinct types of “special events” dramatically reduces false-alarm rate.

Prime Video customers must be able to reliably stream content at all times on any device that supports the Prime Video application, such as mobile phones, smart TVs, or video game consoles.

Related content
The switch to WebAssembly increases stability, speed.

For the Prime Video team, deploying and maintaining the application on such a broad scale entails custom code configurations and third-party integrations that are unique to particular geographical regions and families of devices. This diversity poses the risk of a fragmented customer experience, wherein device- or region-specific issues affect only a subset of customers.

Manually setting alarms that monitor the quality of the Prime Video application across all combinations of customer activities, device types, and regions is infeasible. However, this problem can be reframed as a large-scale, online, time-series anomaly detection problem, such that an automated monitoring solution alerts on-call engineers to deviations from expected behavior in observed traffic.

Monitorable metrics.png
The Cartesian product of independent metric dimensions results in a combinatorial explosion of time series describing different aspects of customer activity on Prime Video.

In this post, we shed light on practical challenges that arise when applying anomaly detection to time series describing customer activity and present a selection of mitigating techniques. The proposed solutions distinguish different categories of deviations induced by fluctuating customer viewing behavior and have contributed to a significant reduction in the false alarms that would otherwise distract Prime Video engineers from meeting real customer needs.

Time series deviations.png
Sample time series containing two notable deviations from expected behavior. Only the second deviation corresponds to a customer-impacting malfunction, whereas the first was caused by an external event.

This distinction is especially challenging because innocuous drops in metric traffic can look very similar to those caused by genuine incidents. The graph below depicts two independent deviations from expected behavior that would be regarded as anomalous in the absence of any additional information. However, after inspecting the contexts surrounding these two anomalies, we discovered that only the second was caused by a correctable software malfunction, whereas the first was simply an artifact of lower Prime Video viewership while an external event was taking place.

Innocuous changes to customer viewing behavior on media-streaming platforms such as Prime Video can be driven by several factors. In this post, we shall focus on what we shall henceforth refer to as special events, which we further categorize as

  1. anticipated special events, e.g., major sporting tournaments;
  2. unanticipated low-impact special events, e.g., sunny weather encouraging more outdoor activities;
  3. unanticipated high-impact special events, e.g., breaking news broadcasts or natural disasters.
Special-event taxonomy.png
Taxonomy of different types of special events affecting Prime Video customer traffic.

1. Anticipated special events

Prime Video viewers sometimes seek content that is available only on other services. For instance, highly anticipated sporting events, such as the NFL Super Bowl or the FIFA World Cup, are known to dominate TV ratings on regular broadcasting.

Related content
Detectors for block corruption, audio artifacts, and errors in audio-video synchronization are just three of Prime Video’s quality assurance tools.

Conversely, Prime Video exclusives, such as NFL Thursday Night Football games, and tentpole content launches, such as The Lord of the Rings: The Rings of Power, are expected to result in transient surges in metric traffic. In the absence of context, the deviations in either direction may be large enough to be flagged as anomalous, resulting in false alarms about the state of the Prime Video application.

If a complete schedule of events that are expected to affect metric traffic is available, anomaly detection models can be enhanced by covariates or exogenous variables. Taking forecasting-based anomaly detection as an example, the inclusion of covariates should result in more meaningful predictions against which anomaly scores can be computed.

Binary encoding of events.png
A binary encoding of scheduled events, wherein an activation indicates the occurrence of an external event.

Leveraging covariates for this purpose remains nontrivial. For example, different matches within a tournament attract differing viewership, depending on which teams are playing, the risk of a popular team being knocked out, etc. It is challenging to encode such nuances in a binary covariate that is activated whenever any external event is ongoing, and further offline analysis of historical data is required to identify additional associative or causal variables that influence the deviations induced by different events.

2. Unanticipated low-impact special events

Curating an exhaustive list of relevant events for geographically dispersed customers is a near-impossible task, especially when compounded by the wide variety of devices on which the Prime Video application is available. Events can also be rescheduled at short notice, invalidating any provisions made to accommodate them. In our taxonomy, unanticipated low-impact events are events that are unaccounted for but whose overall impact may still be discernible by other means.

Related content
Team from Amazon Web Services also wins the best-paper award at the Workshop on Detection and Classification of Acoustic Scenes and Events.

To mitigate the impact of incomplete covariate information, we advocate for an ensemble-based approach combining multiple detectors that explicitly capture different characteristics of time series behavior, such as mean, variance, trend, etc. When monitoring Prime Video metrics, we found that relying solely on models that gauge the magnitude of a deviation, such as forecasting-based scorers, was insufficient. Meanwhile, introducing additional derivative and correlation-based detectors greatly enhanced our ability to filter out innocuous anomalies related to special events.

Complementary anomaly scorers.png
Examples of how two complementary anomaly scorers (forecasting- and derivative-based) can be treated as an ensemble for assessing the severity of an anomaly. Note how in the second example, the derivative-based scorer indicates an anomaly only during the period where the trend is reversed, whereas the increased forecasting-based score persists beyond the initial deviation.

3. Unanticipated high-impact special events

Some special events happen not only unexpectedly but with such sudden and drastic impact that they are especially hard to distinguish from a genuine malfunction. Examples include widespread power outages due to natural disasters and breaking-news broadcasts announcing election results, the unexpected passing of a public figure, etc.

Related content
CVPR papers examine the recovery of 3-D information from camera movement and learning general representations from weakly annotated data.

Mimicking the judgment of an end user triaging an anomaly post hoc is often the best way to handle such unpredictable and dramatic deviations. The effects of external events can often be distinguished from application malfunctions by their correlation with other metrics in the affected region. More specifically, at the time an anomaly is detected for Prime Video, we are interested in verifying whether similar deviations have also been observed for metrics describing services on distinct technology stacks.

Outlook

Identifying distinct categories of special events and deploying appropriate remedies have been invaluable for improving how we monitor metrics describing customer activity. This has allowed Prime Video engineers to instead focus their time on delivering more new and exciting features for customers. One consideration this post hasn’t touched upon is the risk of missing a genuine incident as a result of introducing additional suppression mechanisms. This is an important factor that should be regularly assessed and effectively communicated to end users of the monitoring service.

Related content
Automated-reasoning method enables the calculation of tight bounds on the use of resources — such as computation or memory — that results from code changes.

The operational challenges of delivering reliable anomaly detection in practical settings are often disregarded as domain-specific idiosyncrasies. Consequently, they are largely overlooked in the prolific stream of novel modeling and methodological contributions appearing in the literature on time series anomaly detection. The insights shared in this blog post are not exhaustive either, but we hope this serves as a useful guide for practitioners facing similar issues and motivates broader research on both domain-specific and domain-agnostic mechanisms for translating detected anomalies into actionable alarms.

Research areas

Related content

US, NY, New York
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Senior Applied Scientist to work on pre-training methodologies for Generative Artificial Intelligence (GenAI) models. You will interact closely with our customers and with the academic and research communities. Key job responsibilities Join us to work as an integral part of a team that has experience with GenAI models in this space. We work on these areas: - Scaling laws - Hardware-informed efficient model architecture, low-precision training - Optimization methods, learning objectives, curriculum design - Deep learning theories on efficient hyperparameter search and self-supervised learning - Learning objectives and reinforcement learning methods - Distributed training methods and solutions - AI-assisted research About the team The AGI team has a mission to push the envelope in GenAI with Large Language Models (LLMs) and multimodal systems, in order to provide the best-possible experience for our customers.
US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! Key job responsibilities - Develop ML models for various recommendation & search systems using deep learning, online learning, and optimization methods - Work closely with other scientists, engineers and product managers to expand the depth of our product insights with data, create a variety of experiments to determine the high impact projects to include in planning roadmaps - Stay up-to-date with advancements and the latest modeling techniques in the field - Publish your research findings in top conferences and journals A day in the life We're using advanced approaches such as foundation models to connect information about our videos and customers from a variety of information sources, acquiring and processing data sets on a scale that only a few companies in the world can match. This will enable us to recommend titles effectively, even when we don't have a large behavioral signal (to tackle the cold-start title problem). It will also allow us to find our customer's niche interests, helping them discover groups of titles that they didn't even know existed. We are looking for creative & customer obsessed machine learning scientists who can apply the latest research, state of the art algorithms and ML to build highly scalable page personalization solutions. You'll be a research leader in the space and a hands-on ML practitioner, guiding and collaborating with talented teams of engineers and scientists and senior leaders in the Prime Video organization. You will also have the opportunity to publish your research at internal and external conferences.
US, NY, New York
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are looking for a self-motivated, passionate and resourceful Applied Scientist to bring diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. You will spend your time as a hands-on machine learning practitioner and a research leader. You will play a key role on the team, building and guiding machine learning models from the ground up. At the end of the day, you will have the reward of seeing your contributions benefit millions of Amazon.com customers worldwide. Key job responsibilities - Develop AI solutions for various Prime Video Search systems using Deep learning, GenAI, Reinforcement Learning, and optimization methods; - Work closely with engineers and product managers to design, implement and launch AI solutions end-to-end; - Design and conduct offline and online (A/B) experiments to evaluate proposed solutions based on in-depth data analyses; - Effectively communicate technical and non-technical ideas with teammates and stakeholders; - Stay up-to-date with advancements and the latest modeling techniques in the field; - Publish your research findings in top conferences and journals. About the team Prime Video Search Science team owns science solution to power search experience on various devices, from sourcing, relevance, ranking, to name a few. We work closely with the engineering teams to launch our solutions in production.
US, CA, San Francisco
If you are interested in this position, please apply on Twitch's Career site https://www.twitch.tv/jobs/en/ About Us: Twitch is the world’s biggest live streaming service, with global communities built around gaming, entertainment, music, sports, cooking, and more. It is where thousands of communities come together for whatever, every day. We’re about community, inside and out. You’ll find coworkers who are eager to team up, collaborate, and smash (or elegantly solve) problems together. We’re on a quest to empower live communities, so if this sounds good to you, see what we’re up to on LinkedIn and X, and discover the projects we’re solving on our Blog. Be sure to explore our Interviewing Guide to learn how to ace our interview process. You can work in San Francisco, CA or Seattle, WA. Perks - Medical, Dental, Vision & Disability Insurance - 401(k) - Maternity & Parental Leave - Flexible PTO - Amazon Employee Discount
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist with a strong deep learning background, to help build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As an Applied Scientist with the AGI team, you will work with world-class scientists and engineers to develop novel data, modeling and engineering solutions to support the responsible AI initiatives at AGI. Your work will directly impact our customers in the form of products and services that make use of audio technology. About the team While the rapid advancements in Generative AI have captivated global attention, we see these as just the starting point. Our team is dedicated to pushing the boundaries of what’s possible, leveraging Amazon’s unparalleled ML infrastructure, computing resources, and commitment to responsible AI principles. And Amazon’s leadership principle of customer obsession guides our approach, prioritizing our customers’ needs and preferences each step of the way.
US, WA, Bellevue
Are you interested in a unique opportunity to advance the accuracy and efficiency of Artificial General Intelligence (AGI) systems? If so, you're at the right place! As a Quantitative Researcher on our team, you will be working at the intersection of mathematics, computer science, and finance, you will collaborate with a diverse team of engineers in a fast-paced, intellectually challenging environment where innovative thinking is encouraged and rewarded. We operate at Amazon's large scale with the energy of a nimble start-up. If you have a learner's mindset, enjoy solving challenging problems, and value an inclusive team culture, you will thrive in this role, and we hope to hear from you. Key job responsibilities * Conduct statistical analyses on web-scale datasets to develop state-of-the-art multimodal large language models * Conceptualize and develop mathematical models, data sampling and preparation strategies to continuously improve existing algorithms * Identify and utilize data sources to drive innovation and improvements to our LLMs About the team We are passionate engineers and scientists dedicated to pushing the boundaries of innovation. We evaluate and represent the customer perspective through accurate benchmarking.
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a highly skilled and experienced Senior Applied Scientist, to lead the development and implementation of algorithms and models for supervised fine-tuning and reinforcement learning through human feedback; with a focus across text, image, and video modalities. As a Senior Applied Scientist, you will play a critical role in driving the development of Generative AI (Gen AI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in GenAI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of GenAI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports - Mentor and guide junior scientists and engineers, and contribute to the overall growth and development of the team
MX, DIF, Mexico City
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Machine Learning team in Mexico City. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning, LLMs and Agentic AI, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Design, implement, and evolve Agentic AI systems that can autonomously perceive their environment, reason about context, and take actions across business workflows—while ensuring human-in-the-loop oversight for high-stakes decisions. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise leadership, both tech and non-tech. - Support technical trade-offs between short-term needs and long-term goals.
BR, SP, Sao Paulo
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Machine Learning team in Mexico City. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning, LLMs and Agentic AI, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Design, implement, and evolve Agentic AI systems that can autonomously perceive their environment, reason about context, and take actions across business workflows—while ensuring human-in-the-loop oversight for high-stakes decisions. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise leadership, both tech and non-tech. - Support technical trade-offs between short-term needs and long-term goals.
BR, SP, Sao Paulo
Do you like working on projects that are highly visible and are tied closely to Amazon’s growth? Are you seeking an environment where you can drive innovation leveraging the scalability and innovation with Amazon's AWS cloud services? The Amazon International Technology Team is hiring Applied Scientists to work in our Software Development Center in Sao Paulo. The Intech team builds International extensions and new features of the Amazon.com web site for individual countries and creates systems to support Amazon operations. We have already worked in Germany, France, UK, India, China, Italy, Brazil and more. Key job responsibilities About you You want to make changes that help millions of customers. You don’t want to make something 10% better as a part of an enormous team. Rather, you want to innovate with a small community of passionate peers. You have experience in analytics, machine learning and big data, and a desire to learn more about these subjects. You want a trusted role in strategy and product design. You put the customer first in your thinking. You have great problem solving skills. You research the latest data technologies and use them to help you innovate and keep costs low. You have great judgment and communication skills, and a history of delivering results. Your Responsibilities - Define and own complex machine learning solutions in the consumer space, including targeting, measurement, creative optimization, and multivariate testing. - Influence the broader team's approach to integrating machine learning into business workflows. - Advise senior leadership, both tech and non-tech. - Make technical trade-offs between short-term needs and long-term goals.