Learning variant product relationship and variation attributes from e-commerce website structures
2024
We introduce VARM, variant relationship matcher strategy, to identify pairs of variant products in e-commerce catalogs. Traditional definitions of entity resolution are concerned with whether product mentions refer to the same underlying product. However, this fails to capture product relationships that are critical for e-commerce applications, such as having similar, but not identical, products listed on the same webpage or share reviews. Here, we formulate a new type of entity resolution in variant product relationships to capture these similar e-commerce product links. In contrast with the traditional definition, the new definition requires both identifying if two products are variant matches of each other and what are the attributes that vary between them. To satisfy these two requirements, we developed a strategy that leverages the strengths of both encoding and generative AI models. First, we construct a dataset that captures webpage product links, and therefore variant product relationships, to train an encoding LLM to predict variant matches for any given pair of products. Second, we use RAG prompted generative LLMs to extract variation and common attributes amongst groups of variant products. To validate our strategy, we evaluated model performance using real data from one of the world’s leading e-commerce retailers. The results showed that our strategy outperforms alternative solutions and paves the way to exploiting these new type of product relationships.
Research areas