Product image representation learning on large scale noisy datasets

2023
Download Copy BibTeX
Copy BibTeX
Learning product similarity using distance metric learning from real world catalog needs to take care of large number of product categories and noisy labels. On one hand, large number of product categories makes online hard mining (OHM) less effective as hard triplets become sparse and thus difficult to find. On the other hand, the validity of the hard-triplets themselves is less certain in the case of noisy labelled training data. In this paper, we address the problem of large-scale product representation learning in the presence of noisy training data. To address these challenges, we propose a novel co-teaching based label correction scheme for distance metric learning, that is motivated by the inconsistencies of variations relationships in the product catalog. To validate our approach, we conducted experiments on 20 different product categories, where we achieve up to 4% improvement in PR-AUC compared to the SOTA baseline and conclude by discussing the durable learnings we gained from these experiments and directions for future research.
Research areas

Latest news

US, MA, Westborough
Amazon is looking for talented Postdoctoral Scientists to join our Fulfillment Technology and Robotics team for a one-year, full-time research position. The Innovation Lab in BOS27 is a physical space in which new ideas can be explored, hands-on. The Lab provides easier access to tools and equipment our inventors need while also incubating critical technologies necessary for future robotic products. The Lab is intended to not only develop new technologies that can be used in future Fulfillment, Technology, and Robotics products but additionally promote deeper technical collaboration with universities from around the world. The Lab’s research efforts are focused on highly autonomous systems inclusive of robotic manipulation of packages and ASINs, multi-Read more