COSMO: A large-scale e-commerce common sense knowledge generation and serving system at Amazon
2024
Applications of large-scale knowledge graphs in the e-commerce platforms can improve shopping experience for their customers. While existing e-commerce knowledge graphs (KGs) integrate a large volume of concepts or product attributes, they fail to discover user intentions, leaving the gap with how people think, behave, and interact with surrounding world. In this work, we present COSMO, a scalable system to mine user-centric commonsense knowledge from massive behaviors and construct industry-scale knowledge graphs to empower diverse online services. In particular, we describe a pipeline for collecting high-quality seed knowledge assertions that are distilled from large language models (LLMs) and further re-fined by critic classifiers trained over human-in-the-loop annotated data. Since those generations may not always align with human preferences and contain noises, we then describe how we adopt instruction tuning to finetune an efficient language model (COSMO-LM) for faithful e-commerce commonsense knowledge generation at scale. COSMO-LM effectively expands our knowledge graph to 18 major categories at Amazon, producing millions of high-quality knowledge with only 30k annotated instructions. Finally COSMO has been deployed in Amazon search applications such as search navigation. Both offline and online A/B experiments demonstrate our proposed system achieves significant improvement. Further-more, these experiments highlight the immense potential of commonsense knowledge extracted from instruction-finetuned large language models.
Research areas