In his 2020 shareholder letter, Jeff Bezos, executive chair of Amazon’s board of directors, shared that more than 200 million people around the world have a Prime membership — along with its attendant benefits.
Those include delivery benefits (like free one and two-day delivery), digital benefits (such as Prime Video and Amazon Music), and shopping benefits (including Prime Day member deals). Prime members are also able to download thousands of e-books, magazines and comics for free, get unlimited photo storage, order groceries online, and more.
Amazon is continually expanding and evolving its selection of Prime benefits to enhance the value for members. As Bezos wrote in an earlier shareholder letter: "We want Prime to be such a good value, you'd be irresponsible not to be a member.”
To help deliver more value to Prime members, scientists within Amazon’s Prime organization develop methods to help consumers discover and utilize Prime benefits. Using techniques derived from machine learning, structural econometrics, and other disciplines, they also help Amazon decide how to evolve Prime benefit offerings around the world.
Surface the most relevant Prime benefits to customers
When shoppers visit the Amazon Store, they are presented with a variety of Prime callouts with relevant benefits and related product information. Callouts for non-Prime members might outline the wide variety of benefits available, while Prime members might see more options to utilize their Prime benefits. For example, a Prime member visiting the detail page for the movie Jane Eyre might see a callout saying that the title is available for free on Prime Reading.
We utilize recommender systems to engage shoppers with information about Prime benefits that they would find most interesting.
“We utilize recommender systems to engage shoppers with information about Prime benefits that they would find most interesting,” says Houssam Nassif, a principal applied scientist within Amazon’s consumer organization.
To make predictions about the callout that will most excite customers, the system maps item attributes (like brand, color, price, title, and category) to how often items are selected by customers. The models embedded in the system use Bayesian recommenders to make decisions on the most relevant content to surface. Bayesian inferences are used to make predictions about future events by updating prior hypotheses as more information becomes available.
However, there are limits to this approach. For example, relying exclusively on Bayesian methods to measure customer selections can bias results toward more popular items. For example, shoppers interested in Jane Eyre might also want to read new romance novels. The challenge: newer items have untrained model weights, which can cause the system to underestimate their true click probability.
“This experience would be similar to going to a music recommendation engine, and seeing only the chart toppers in your favorite categories,” Nassif explains. “To improve the diversity of recommendations, we have to overcome the classic exploitation-exploration dilemma by including relevant and popular items [exploitation] along with newer or long-tail items that scored higher than their expected value [exploration].”
To do this, the Prime ML team utilizes methods that allow the algorithm to update the “click-probability” score by using delayed feedback from customers.
“Adaptive systems allow us to focus the diversity of recommendations even further,” says Nassif.
Prime’s adaptive systems respond continually to evolving preferences across all Amazon customers. For example, classic-literature enthusiasts who read Jane Eyre will not see callouts for romance novels or romantic comedy movies unless they express some interest in other romance novels. Some of those recommender systems are captured in the paper "Bayesian meta-prior learning using Empirical Bayes".
Recommending content that customers love
Determining the most relevant Prime benefits to present to users is the first step. Prime’s scientists have also developed algorithms to determine which formats are most likely to appeal to customers.
“Every callout has multiple dimensions, which in turn presents a large number of decisions,” says Nassif. “Do customers like to see their name? Should the callout feature a single particular product? Or even a grouping of products? To make these decisions, we have to develop an accurate understanding of customer preferences.”
Callouts comprise multiple components: headline, body copy, an image (or images). They can also include other elements like customer reviews. Testing multiple variables is a combinatorial problem that can often cover a large decision space. This poses limitations on the speed of experiments designed to arrive at the layout customers prefer most.
To eliminate combinatorial explosions that can result from considering every possible combination, the models score a small subset of combinations before extrapolating their learnings to the larger universe of layouts that can be presented to customers. Conditioned by prior observations, the models are able to select the layout that has the highest probability of delivering the highest customer value.
Evolving the selection of Prime benefits
In addition to informing how customers receive recommendations about Prime as it exists today, scientists also influence how Prime will evolve as a membership. This work involves scientists from multiple disciplines collaborating closely to determine the best selection of benefits: from determining how best to reduce shipping speeds for Prime (including items eligible for the fastest speeds) to recommending which new podcasts Amazon Music should release.
Charlie Manzanares is a senior manager on the team that specializes in simulating how customers benefit from expansion of Prime benefits. Manzanares’s team comprises economists, applied scientists, research scientists, and business intelligence engineers who partner closely with product managers and software and data engineers.
Our team works at the scientific intersection of structural econometrics, machine learning, and causal inference. Building these tools often involves inventing new science.
“Our team works at the scientific intersection of structural econometrics, machine learning, and causal inference,” says Manzanares. “Building these tools often involves inventing new science, by involving scientists and engineers from a variety of backgrounds. We then utilize these tools to create scientific software at engineering scale. What’s exhilarating about this space is not just solving these scientific and technical challenges, but using these tools to make Prime better for members around the world. Moreover, the company relies on this information to make high-stakes investments. This adds an interesting layer of strategic management consulting to our work.”
Manzanares points to a recent innovation from Prime scientists that made modeling dynamic customer decisions easier.
“Prime members make ‘dynamic’ choices over whether, and when, to become and remain Prime members. Dynamic customer choices often involve tradeoffs between value and flexibility,” he explains. “For example, in the US, most customers choose between joining Prime’s annual or monthly plans, or ending their membership or not joining Prime at all. Over time, this tradeoff results in many possible permutations of choices. For example, a member might choose monthly Prime for two months, then join annual Prime. Or they might choose monthly Prime for two months, remain non-Prime for three, then join monthly Prime for five more months, etc.”
Modeling the impact of these choice permutations in a way that is useful for counterfactual simulation is theoretically and computationally challenging.
The theoretical challenge is an “identification” problem, Manzanares explains. The identification problem makes it hard for scientists to determine which Prime feature caused members to make a particular choice.
“For example, did a member who engaged with Prime shipping and Prime Video choose to renew because they valued Prime shipping highly, but Prime Video less, or Prime Video highly, and Prime shipping less?” asks Manzanares. “This problem is common to both dynamic and ‘static’ choice problems (i.e., choice problems where choice values are not influenced by past choices). The computational problem — which is pervasive in dynamic choice settings — is generated by the sheer number of possible choices, which is labeled the ‘curse of dimensionality’ in dynamic programming literature.”
To overcome these challenges, the team combined new techniques from inverse reinforcement learning with an old assumption from structural econometrics. Inverse reinforcement learning is a machine learning paradigm popularized in the late 1990s and early 2000s.
As opposed to reinforcement learning, which learns behavioral “policies” through active experimentation, inverse reinforcement learning learns “reward” or “utility” functions from actual customer behavior. It then uses estimated utility functions to make choices in new settings. Structural econometrics is an older paradigm with a rich literature and has been used for these types of exercises since the 1940s.
“On the one hand, inverse reinforcement learning draws upon modern machine learning techniques. These techniques allow for rich approximations in complex settings,” says Manzanares. “On the other hand, structural econometrics has already solved many complex theoretical issues related to counterfactual simulation. These solutions often predate the development of modern machine learning and computation. This dichotomy creates opportunities for intellectual arbitrage between literatures.”
The team’s approach to the challenge is described in the paper “Deep PQR: Solving Inverse Reinforcement Learning using Anchor Actions,” which was published at the 2020 International Conference for Machine Learning.
“The findings presented in the paper are applicable across multiple fields,” says Manzanares. “That’s not surprising since the paper’s insights were made possible by collaboration across multiple disciplines.”
Prime scientists use inverse-reinforcement models to develop insights. These insights show how Prime should evolve to meet customer needs. For example, how should Prime evolve to best meet the needs of Gen Z, who engage more heavily with digital benefits (video, music, gaming)? How can grocery delivery and pickup maximize customer convenience?
These questions multiply as Prime expands globally. In international marketplaces — especially emerging ones — customer needs vary widely. For example, how might Prime serve both rural and urban customers in a marketplace like India, where needs among rural and urban customers might be very different? Experimentation, Manzanares notes, becomes critical.
“The process of discovering what customers want across the world is a lot of fun,” he says. “Combine that with building cutting-edge science in partnership with extremely talented science, engineering, and business professionals, and this makes Prime a really rewarding place to be a scientist.”