Offline multi-objective optimization (OMOO) in search page layout optimization using off-policy evaluation
2024
E-commerce stores typically test changes to ranking algorithms through rigorous A/B testing which requires a change to satisfy some predefined success criteria on multiple metrics. This problem of simultaneously optimization of multiple metrics is multi-objective-optimization (MOO). A common method for MOO is to choose a set of weights to scalarize the multiple metrics into one ranking objective. However, in practical settings, rather than simply improving all metrics, the experimenter might be interested in improving a few metrics significantly with negligible trade-off for others. We can refer to such requirements as a desired policy. An experimenter chooses weights to scalarize the objective such that it best approximates the desired policy. Repeated A/B testing to arrive at a set of weights that approximates the desired policy well enough is costly and inefficient. This problem lends itself to off-policy evaluation methods. In this paper, we develop a framework for approximate Offline Multi-Objective Optimization for 𝜋,𝑤 explore-exploit policies where a small (𝜖 = 1 − 5%) traffic is reserved for exploration while the majority is served by exploiting the current best arm under the policy. Further, the metrics being optimized in our use case are highly skewed with zero-inflation. We then develop a simulator/ reward vector generator using a neural network that learns a distribution of rewards for a given context from exploration data. We empirically show that this reward vector generator is an unbiased estimator of such policies. Finally, we demonstrate empirical data that this estimator is able to correctly predict the order of treatments from an A/B test in an e-commerce page layout ranker across 4 different metrics.
Research areas