E-commerce marketplaces protect shopper experience and trust at scale by deploying deep learning models trained on human annotated moderation data, for the identification and removal of advert imagery that does not comply with moderation policies (a.k.a. defective images). However, human moderation labels can be hard to source for smaller advert programs that target specific device types with separate formats or for recently launched locales with unique moderation policies. Additionally, the sourced labels can be noisy due to annotator biases or policy rules clubbing multiple types of transgressions into a single category. Therefore, training advert image moderation models necessitates an approach that can effectively improve the sample efficiency of training, weed out noise and discover latent moderation sub-labels in one go.
Our work demonstrates the merits of automated sub-label discovery using self-labelling. We show that self-labelling approaches can be used to decompose an image moderation task into its hidden sub-tasks (corresponding to intercepting a single sub-label) in an unsupervised manner, thus helping with cases where the granularity of labels is inadequate. This enables us to bootstrap useful representations quickly, via low-capacity but fast-learning teacher models that each specialize in a single distinct sub-task of the main classification task. These sub-task specialists then distil their logits to a high-capacity but slow-learning generalist student model, thus allowing it to perform well on complex moderation tasks with relatively fewer labels than vanilla supervised training. We conduct all our experiments on the moderation of sexually explicit advert images (though this method can be utilized for any defect type) and show a sizeable improvement in NPV (+30.2% absolute gain) viz-aviz regular supervised baselines at a 1% FPR level. A long-term A/B test of our deployed model shows a significant relative reduction (-45.6%) in the prevalence of such advertisements compared to the previously deployed model.
Sub-task imputation via self-labelling to train image moderation models on sparse noisy data
2022
Research areas