Amazon Scholar Yizhou Sun recently won the test of time award from the Very Large Database (VLDB) Endowment for a 2011 paper that introduced a meta-path-based systematic solution to arbitrary heterogeneous information networks, which has evolved to become a ubiquitous data model applicable to many real-world applications.
The paper, “PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks”, was the first to introduce the concept of network schema to define a general heterogeneous information network, and to propose the concept of meta-path to systematically define the similarity between two entities based on their connectivity. Specifically, this work proposed PathSim, or “meta-path-based similarity,” as an instance to use meta-paths to define similarity between entities to facilitate similarity search.
Sun and her coauthors — Jiawei Han, the Abel Bliss Professor of Computer Science at the University of Illinois Urbana-Champaign (UIUC); Philip S. Yu, Wexler Chair in Information Technology and computer science professor at UIUC; Xifeng Yan, Venkatesh Narayanamurti Chair of the Computer Science Department at the University of California, Santa Barbara; and Tianyi Wu, an engineering manager at Meta — sought to tackle the fundamental question of how to define similarities among data points that don’t fit neatly into the independent and identically distributed data (i.i.d.) setting. The methodology is widely used today across a wide range of industries, including health care, academic research, social networks, and e-commerce.
“People have gradually realized when we talk about data, it's beyond the tables that typically come to mind,” Sun said. “Data can be much more complicated. Data points can interact with each other, and those interactions give us lots of power to understand every single data point.”
In an academic setting, patterns exist among seemingly independent data points such as research papers that cite each other, authors, keywords, and the venues where papers are published. The ability to understand these types of complex connections is also critical for healthcare applications, which can benefit from leveraging data surrounding patients, such as disease symptoms, drugs, genes, and other factors.
“In the Amazon setting, we have customers, products, advertisements, and lots of other different types of entities,” Sun said. “So that's why we wanted to study these new types of networked data, which we named ‘heterogeneous information networks.’”
The VLDB Conference is one of the most prestigious conferences in the database sector. The VLDB Endowment selects the test of time award winner from papers presented 10 to 12 years earlier that have had impact on the academic community but also significant business value. Since its publication, Sun’s paper has gained more than 1,600 citations, sparking significant follow-up academic research and commercial applications.
Sun received bachelor of science degrees in computer science and statistics from Peking University, then went on to receive a master of engineering degree from the Department of Intelligence Science at Peking University in 2007. A year after publishing her 2011 award-winning paper, she earned her PhD in computer science from the University of Illinois at Urbana-Champaign. Her thesis, “Mining Heterogeneous Information Networks”, won the Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining (ACM SIGKDD) 2013 Dissertation Award.
Sun worked as an assistant professor in the College of Computer and Information Science at Northeastern University from 2013 to 2016. She joined the University of California, Los Angeles faculty in June 2016, where she currently is on sabbatical from her job as an associate professor of computer science.
Sun is also a two-time recipient of an Amazon Research Award, which offers unrestricted funds and Amazon Web Services promotional credits to support research at academic institutions and nonprofit organizations in areas that align with Amazon’s mission to advance customer-obsessed science. She won her 2018 award from Amazon’s Product Graph Team and her 2020 award from the Deep Graph Learning Team. That research continues to inform her work as an Amazon Scholar, a role she has occupied since June 2021.
“We try to move things from the academic setting to the industrial setting — not only the methodologies, but also how you can deploy your ideas and your algorithms to the dataset at an extremely large scale, like at Amazon,” said Sun.
In her capacity as an Amazon Scholar, Sun works on a team within Amazon Ads where she is constructing a heterogeneous information network based on Amazon Ads data. Sun hopes to use machine learning to create better recommendations for customers, improving the customer experience.
“With these new tools, we can enhance the ads product recommendations to consumers and improve the campaign recommendations to business clients,” Sun said. “We will be able to help our customers gain more from the data in Amazon and provide them with much better services that benefit them over a longer term.”
Sun and her coauthors also published an invited paper regarding the award titled “Heterogeneous Information Networks: the Past, the Present, and the Future”.