Unsupervised multi-modal representation learning for high quality retrieval of similar products at e-commerce scale
2023
Identifying similar products in e-commerce is useful in discovering relationships between products, making recommendations, and in-creasing diversity in search results. Product representation learning is the first step to define a generalized product similarity metric for search. The second step is to extend similarity search to a large scale (e.g., e-commerce catalog scale) without sacrificing quality. In this work, we present a solution that interweaves both steps, i.e., learn representations suited to high quality retrieval using contrastive learning (CL) and retrieve similar items from a large search space using approximate nearest neighbor search (ANNS) to trade-off quality for speed. We propose a CL training strategy for learning uni-modal encoders suited to multi-modal similarity search for e-commerce. We study ANNS retrieval by generating Pareto Frontiers (PFs) without requiring labels. Our CL training strategy doubles retrieval@1 metric across categories (e.g., from 36% to 88% in cat-egory C). We also demonstrate that ANNS engine optimization using PFs help select configurations appropriately (e.g., we achieve 6.8× search speed with just 2% drop from the maximum retrieval accuracy in medium size datasets).
Research areas