Semantic matching is an important component of a product search pipeline. Its goal is to capture the semantic intent of the search query as opposed to the syntactic matching performed by a lexical matching system. A semantic matching model captures relationships like synonyms, and also captures common behavioral patterns to retrieve relevant results by generalizing from purchase data. Semantic matching models however suffer from lack of availability of informative negative examples for model training. Various methods have been proposed in the past to address this issue based upon hard-negative mining and contrastive learning. In this work, we propose a novel method for semantic matching based on one-class classification called SMOCC.
Given a query and a relevant product, SMOCC generates the representation of an informative negative which is then used to train the model. Our method is based on the idea of generating negatives by using adversarial search in the neighborhood of the positive examples. We also propose a novel approach for selecting the radius to generate adversarial negative products around queries based on the model’s understanding of the query. Depending on how we select the radius, we propose two variants of our method: SMOCC-QS, that quantizes the queries using their specificity, and SMOCC-EM, that uses expectation-maximization paradigm to iteratively learn the best radius. We show that our method outperforms the state-of-the-art hard negative mining approaches by increasing the purchase recall by 3 percentage points, and improving the percentage of exacts retrieved by up to 5 percentage points while reducing irrelevant results by 1.8 percentage points.
Beyond hard negatives in product search: Semantic matching using one-class classification (SMOCC)
2023
Research areas