Improve MT for search with selected translation memory using search signals

2022
Download Copy BibTeX
Copy BibTeX
Multilingual search is indispensable for a seamless e-commerce experience. E-commerce search engines typically support multilingual search by cascading a machine translation step before searching the index in its primary language. In practice, search query translation usually involves a translation memory matching step before machine translation. A translation memory (TM) can (i) effectively enforce terminologies for specific brands or products 1 , (ii) reduce the computation footprint and latency for synchronous translation and, (iii) fix machine translation issues that cannot be resolved easily or quickly without retraining/tuning the machine translation engine in production.2 In this abstract, we will propose (1) a method of improving MT query translation using such TM entries when the TM entries are only sub-strings of a customer search query, and (2) an approach to selecting TM entries using search signals that can contribute to better search results.

Translation Memory (TM) is usually activated when a run-time query exactly matches an entry in the memory. We have also observed that many TM entries can partially match a large percentage of queries at run-time. Therefore, exploiting the placeholder features of modern industrial machine translation, we propose a method of implementing a sub-string partial matching feature that enables the NMT models at run-time to recognize the longest TM entry as sub-string, then use the sub-string TM translation to replace the MT output of that sub-string3 Partial matching enables one TM entry to impact a larger number of queries, so it is crucial only to select TM entries that can bring a positive impact to the customers’ shopping experience. The search results matter from the customer’s perspective and MT query translations are used as intermediate artifacts for search. We rely on customer purchasing behavior as a signal for relevance judgments to automatically estimate the search performance (e.g. nDCG) of MT query translations, and a TM entry is selected if the MT query translation with the TM sub-string substitution has better search performance than the default MT query translation.

We conduct offline4 and online experiments for the proposed sub-string matching method with the selected TM subset using the selection approach for Portuguese queries on Amazon.es, German queries on Amazon.com (US), and Dutch queries on Amazon.de. All three stores have seen increased order product sales and improved user experience.

Latest news

GB, MLN, Edinburgh
We’re looking for a Machine Learning Scientist in the Personalization team for our Edinburgh office experienced in generative AI and large models. You will be responsible for developing and disseminating customer-facing personalized recommendation models. This is a hands-on role with global impact working with a team of world-class engineers and scientists across the Edinburgh offices and wider organization. You will lead the design of machine learning models that scale to very large quantities of data, and serve high-scale low-latency recommendations to all customers worldwide. You will embody scientific rigor, designing and executing experiments to demonstrate the technical efficacy and business value of your methods. You will work alongside aRead more