Christos Faloutsos, the Fredkin Professor of Artificial Intelligence within the Computer Science Department at Carnegie Mellon University (CMU) and an Amazon Scholar, was recently awarded the “Most Influential Paper Award” at the 2020 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).
PAKDD is among the world’s leading and longest running international conferences focused on the areas of data mining and knowledge discovery. The award for most influential paper recognizes a work of research that was published at the conference ten years ago, and continues to have significant impact in the field.
The award-winning paper was selected by the 2020 PAKDD award committee, which was led by University of Minnesota professor Jaideep Srivastava. The committee conducted candidate paper selection, citation analysis, and peer review before arriving at their decision.
In 2010 before he joined Amazon, Faloutsos co-authored the paper, OddBall: Spotting Anomalies in Weighted Graphs with Leman Akoglu and Mary McGlohon, who were then students within CMU’s computer science department.
The paper proposed a novel approach to detecting anomalies in large, weighted graphs. Graphs consist of distinct entities or nodes; the relationships between nodes are represented as edges. Examples of nodes include email servers in a network, users of a social network, or donors for a political campaign.
The authors’ paper focused on neighborhoods (spheres or balls – hence the name “Oddball”) around each node to find the nodes that were acting outside the norm. The techniques described in the paper were intended for use in a wide variety of applications including detecting email spammers, finding irregularities in donations for political campaigns, and detecting fraudulent or malevolent accounts on social networks.
The authors’ proposed approach selects a set of features to define neighborhoods around individual nodes. To accomplish this, the anomaly detection algorithm prioritizes features that are fast to compute, making it especially relevant for large-scale, real-world applications.
The algorithm looks for patterns, and identifies nodes that significantly deviate from the discovered patterns as anomalous, before assigning an “outlierness” score to each node. The fast, unsupervised method does away with the need for any user-defined constants.
“The work in OddBall can help spot strange behavior in social networks, telecommunication networks, and other areas,” said Faloutsos. “All our work has been focused on methods that are generally applicable to many settings, and OddBall is along these lines. The second guiding principle is scalability. OddBall and all our methods are specifically designed to scale up to large data sets.”
The work in OddBall can help spot strange behavior in social networks, telecommunication networks, and other areas.
During his career, Faloutsos has conducted research in the areas of data mining for graphs and streams, indexing and data mining for video, biological and medical databases, and database performance evaluation. Faloutsos’ research has focused on bridging theory and practice, by developing mathematically grounded solutions to real-world research problems.
Faloutsos has also received a “test of time” award by ACM-SIGCOMM, the flagship venue for computer network research, for a paper co-authored with his two brothers in 1999. The paper challenged commonly held assumptions about the structure of the internet by demonstrating that the degree distribution of nodes in a network follow power laws, as opposed to Gaussian/Poisson distributions. The discovery had multiple implications for computer network security, provisioning and protocol design, and analysis of user behavior in social networks.
Faloutsos joined Amazon in 2017. As an Amazon Scholar in the company’s Consumer organization, he is currently focused on fraud and anomaly detection, in addition to contributing to projects related to knowledge bases, time series forecasting, database view maintenance, and explainability in deep learning.
“One of the main lessons I learned in my career is the value of real data sets. Yes, they have errors – random, or systematic; they have missing values (often masqueraded as ‘-1’ or ‘0’); class-labels are occasionally wrong; and they have several other issues that are rarely covered in textbooks. However, they help us discover patterns and regularities that we could not imagine beforehand.
“The second lesson I learned,” he continued, “is to start from a research problem and look for a solution, instead of the reverse. The vast majority of data-mining problems seem to have easy solutions (‘do clustering’, or ‘do decision trees’). However, the devil is in the details: real problems often violate the textbook assumptions (like the uniformity assumption, Gaussianity, stationarity, etc.), and as researchers, we need to develop new methods beyond the textbook, to solve real-world problems.”
To learn more about the role of academics at Amazon, visit the Amazon Scholars page.