Multilingual search is indispensable for a seamless e-commerce experience. E-commerce search engines typically support multilingual search by cascading a machine translation step before searching the index in its primary language. In practice, search query translation usually involves a translation memory matching step before machine translation. A translation memory (TM) can (i) effectively enforce terminologies for specific brands or products 1 , (ii) reduce the computation footprint and latency for synchronous translation and, (iii) fix machine translation issues that cannot be resolved easily or quickly without retraining/tuning the machine translation engine in production.2 In this abstract, we will propose (1) a method of improving MT query translation using such TM entries when the TM entries are only sub-strings of a customer search query, and (2) an approach to selecting TM entries using search signals that can contribute to better search results.
Translation Memory (TM) is usually activated when a run-time query exactly matches an entry in the memory. We have also observed that many TM entries can partially match a large percentage of queries at run-time. Therefore, exploiting the placeholder features of modern industrial machine translation, we propose a method of implementing a sub-string partial matching feature that enables the NMT models at run-time to recognize the longest TM entry as sub-string, then use the sub-string TM translation to replace the MT output of that sub-string3 Partial matching enables one TM entry to impact a larger number of queries, so it is crucial only to select TM entries that can bring a positive impact to the customers’ shopping experience. The search results matter from the customer’s perspective and MT query translations are used as intermediate artifacts for search. We rely on customer purchasing behavior as a signal for relevance judgments to automatically estimate the search performance (e.g. nDCG) of MT query translations, and a TM entry is selected if the MT query translation with the TM sub-string substitution has better search performance than the default MT query translation.
We conduct offline4 and online experiments for the proposed sub-string matching method with the selected TM subset using the selection approach for Portuguese queries on Amazon.es, German queries on Amazon.com (US), and Dutch queries on Amazon.de. All three stores have seen increased order product sales and improved user experience.
Improve MT for search with selected translation memory using search signals
2022
Research areas