A heterogeneous graph-based framework for scalable fraud detection

Phanindra Reddy Madduru; Naveed Janvekar

Publication

A heterogeneous graph-based framework for scalable fraud detection

By Phanindra Reddy Madduru, Naveed Janvekar

2023

Download Copy BibTeX GitHub

Share

Download

Copy BibTeX

GitHub

Share

The rise of online marketplaces has led to increased concerns regarding the presence of bad actors involved in counterfeit or engage in fraudulent activities. While efforts are being made by organizations to monitor and address these issues, bad actors persistently find new ways to engage in fraudulent behavior, including creating new accounts using different credentials, account hijacking etc. To combat this issue, our study proposes the use of Heterogeneous Relational Graph Convolutional Networks (HRGCN) to identify risky relationships among entities like sellers or customers. By leveraging this advanced graph-based approach, we aim to enhance the detection and mitigation of fraudulent behavior on the e-commerce marketplaces. The HRGCN model is designed to detect sellers with risky associations with other known bad sellers by analyzing various connecting edges such as encrypted device and identity credentials. With the rapid growth of e-commerce stores, the number of sellers has witnessed an exponential increase, leading to a significant expansion in their social networks formed by sharing various relationships such as digital contact information, communication channels and devices. This has made it challenging to process the data with the direct implementation of HRGCN. This highlights the importance of model scalability in handling large datasets. To address this issue, we have introduced a novel mini-batch version of HRGCN variant that works in tandem with a neighborhood sampler, which is optimized to run on GPUs, significantly reducing the training time by 70%. This mini-batch version of HRGCN maintains and/or improves the performance of the model while addressing the scalability issue, making it an efficient solution for handling large datasets. In this paper, we compare the performance of three models: a benchmark model based on Random Forest trained on seller node features alone, HRGCN trained on full batch, and HRGCN with mini-batch implementation. The findings of our experiments reveal that the HRGCN models outperform the benchmark model with a significant improvement in both F1-score and Recall. Specifically, the HRGCN models show an impressive increase in recall by approximately 115% compared to the baseline model. Moreover, the mini-batch HRGCN model demonstrated substantial improvement in performance over the full batch HRGCN model, achieving a 16% higher F1 score and an 8% higher PR AUC score. These results emphasize the effectiveness of using a mini-batch approach to handle large datasets and detecting related bad sellers.

A heterogeneous graph-based framework for scalable fraud detection

Latest news

Work with us