Know when to fold: Futility-aware early termination in online experiments
2025
As the demand for online A/B testing continues to rises for tech companies, the opportunity cost of conducting these experiments becomes increasingly significant. Consequently, there is a rising need for an efficient continuous monitoring system capable of early terminating experiments when necessary. Existing literature and tools primarily focuses on early terminating experiments with evidently significant results (demonstrated efficacy). However, for example, among the tens of thousands of online experiments conducted every year in Amazon, only a small proportion will meet launch criterion. To improve innovation efficiency and allow terminating experiments for futility, in this paper, we present a comprehensive literature review, propose new methods, and conduct a large-scale meta-analysis using historical online experiments in Amazon. This is the first such kind of study in the literature. We also delve into empirical challenges and explore various empirical strategies to handle them that we met while deploying these methods at Amazon. This paper is based on our work to develop the first such service for the largest online experiment platform at Amazon. Launched in 2024, this product is now available to thousands of labs on the platform each year and sends automatic notifications to experimenters with early termination recommendations. The product saves time for around 10% of labs, cuts about 2 weeks for each terminated lab, and reduces negative impact by several dozen basis points for ineffective or negative treatments.
Research areas