Boosting the performance of deployable timestamped directed GNNs via time-relaxed sampling
2023
Timestamped graphs find applications in critical business problems like user classification, fraud detection, etc. This is due to the inherent nature of the data generation process, in which relationships between nodes are observed at defined timestamps. Deployment-focused GNN models should be trained on point-in-time information about node features and neighborhood, similar to the data ingestion process. However, this is not reflected in benchmark directed node classification datasets, where performance is typically reported on undirected versions of graphs that ignore these timestamps. Constraining the leading approaches trained on undirected graphs to timestamp-based message passing at test time leads to sharp drops in performance. This is driven by the blocking of pathways for neighborhood information, which was available during the undirected training phase but not during the test time, highlighting the label leakage issue in applied graph use-cases. We bridge this mismatch of message passing semantics in directed graphs by first resetting baselines while highlighting the semantic case where undirected training/inference would fail. Second, we introduce TRDGNN, which bridges performance drop, by leveraging a novel GNN sampling layer that relaxes the time-directed nature of the graph only to the extent that it limits any possibility of labels being leaked during the training phase. The two contributions combined form a recipe for robust GNN model deployment in industry use-cases. Finally, we demonstrate the benefits of the proposed relaxation by drawing out qualitative analysis where it helped improve performance on the node classification task on multiple public benchmark and proprietary e-commerce datasets.
Research areas