The real-time estimation of time-varying parameters from high-dimensional, heavy tailed and corrupted data-streams is a common sub-routine in systems ranging from those for network monitoring and anomaly detection to those for traffic scheduling in data-centers. For estimation tasks that can be cast as minimizing a strongly convex loss function, we prove that an appropriately tuned version of the clipped Stochastic Gradient Descent (SGD) is simultaneously (i) adaptive to drift, (ii) robust to heavy-tailed inliers and arbitrary corruptions, (iii) requires no distributional knowledge and (iv) can be implemented in an online streaming fashion. All prior estimation algorithms have only been proven to posses a subset of these practical desiderata. An observation we make is that, neither the O(1/t) learning rate for clipped SGD known to be optimal for strongly convex loss functions of a stationary data-stream, nor the O(1) learning rate known to be optimal for being adaptive to drift in a noiseless environment can be used. Instead, a learning rate of T−-𝛼 for 𝛼 < 1 where T is the stream-length is needed to balance adaptivity to potential drift and to combat noise. We develop a new inductive argument and combine it with a martingale concentration result to derive high-probability under any learning rate on data-streams exhibiting arbitrary distribution shift — a proof strategy that may be of independent interest. Further, using the classical doubling-trick, we relax the knowledge of the stream length T. Ours is the first online estimation algorithm that is provably robust to heavy-tails, corruptions and distribution shift simultaneously. We complement our theoretical results empirically on synthetic and real data.
Research areas