Cloud and systems

Benchmarking tool for graph-centric predictive modeling on databases

4DBInfer enables model comparison across datasets, predictive tasks, database-to-graph extraction methods, and graph-based predictive architectures.

By Quan Gan

February 14, 2025

Relational databases (RDBs) store vast amounts of structured data across multiple interconnected tables. This rich relational information has immense potential for predictive machine learning. However, the progress of predictive models on RDBs currently lags behind advancements in other domains like computer vision or natural-language processing. One key reason is the lack of established, publicly available RDB benchmarks for model training and evaluation.

SIGMOD paper by Amazon researchers and collaborators presents flexible data definition language that enables rapid development of complex graph databases.

Existing predictive models for RDBs often resort to using single-table datasets or graph datasets derived from preprocessed relational data. However, these approaches do not fully capture the native multi-table structure and characteristics of RDBs, potentially limiting model performance.

To address this gap, Amazon’s Shanghai Lablet has developed 4DBInfer, a comprehensive open-source benchmarking tool for graph-centric predictive modeling on RDBs.

4DBinfer enables systematic comparison of diverse baseline models across four key dimensions: (1) RDB datasets, (2) predictive tasks, (3) RDB-to-graph extraction methods, and (4) graph-based predictive architectures. This 4-D design facilitates a thorough exploration of the model design space for RDB predictive analytics.

4DBInfer.16x9.png — 4DBinfer enables systematic comparison of baseline models across four key dimensions: (1) RDB datasets, (2) predictive tasks, (3) RDB-to-graph extraction methods, and (4) graph-based predictive architectures.

Let's dive deeper into 4DBInfer's core components:

RDB datasets and tasks: We curate a suite of RDB benchmarks spanning real-world application domains, including e-commerce, advertising, and social networks. These datasets exhibit diverse characteristics in terms of scale (up to billions of rows), schema complexity, and temporal evolution. For each dataset, we define practically relevant predictive tasks, such as estimating missing cell values.

RDB-to-graph extraction: 4DBInfer supports multiple strategies for converting RDBs into graph representations while preserving rich tabular information. The Row2Node approach treats each table row as a graph node, with foreign-key relationships forming the edges. The Row2N/E method selectively converts some rows into edges to capture more nuanced relational structures. 4DBInfer also introduces "dummy tables" to enrich the graph connectivity.

Benchmarking tool for graph-centric predictive modeling on databases

4DBInfer enables model comparison across datasets, predictive tasks, database-to-graph extraction methods, and graph-based predictive architectures.

Related content

Work with us