Multiple hypothesis testing, a situation when we wish to consider many hypotheses, is a core problem in statistical inference that arises in almost every scientific field. In this setting, controlling the false discovery rate (FDR), which is the expected proportion of type I error, is an important challenge for making meaningful inferences. In this paper, we consider a setting where an ordered (possibly infinite) sequence of hypotheses arrives in a stream, and for each hypothesis we observe a p-value along with a set of features specific to that hypothesis. The decision whether or not to reject the current hypothesis must be made immediately at each timestep, before the next hypothesis is observed. This model provides a general way of leveraging the side (contextual) information in the data to help maximize the number of discoveries while controlling the FDR.
We propose a new class of powerful online testing procedures, where the rejection thresholds are learned sequentially by incorporating contextual information and previous results. We prove that any rule in this class controls online FDR under some standard assumptions. We then focus on a subclass of these procedures, based on weighting the rejection thresholds, to derive a practical algorithm that learns a parametric weight function in an online fashion to gain more discoveries. We also theoretically prove that our proposed procedures, under some easily verifiable assumptions, would lead to an increase of statistical power over a popular online testing procedure proposed by (Javanmard and Montanari, 2018). Finally, we demonstrate the superior performance of our procedure, by comparing it to state-of-the art online multiple testing procedures, on both synthetic data and real data generated from different applications.
Research areas