Predicate caching: Query-driven secondary indexing for cloud data warehouses

Tobias Schmidt; Andreas Kipf; Dominik Horn; Gaurav Saxena; Tim Kraska

Publication

Predicate caching: Query-driven secondary indexing for cloud data warehouses

By Tobias Schmidt, Andreas Kipf, Dominik Horn, Gaurav Saxena, Tim Kraska

2024

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Cloud data warehouses are today’s standard for analytical query processing. Multiple cloud vendors offer state-of-the-art systems, such as Amazon Redshift. We have observed that customer work-loads experience highly repetitive query patterns, i.e., users and systems frequently send the same queries. In order to improve query performance on these queries, most systems rely on techniques like result caches or materialized views.

However, these caches are often stale due to inserts, deletes, or updates that occur between query repetitions. We propose a novel secondary index, predicate caching, to improve query latency for repeating scans and joins. Predicate caching stores ranges of qualifying tuples of base table scans. Such an index can be built on the fly, is lightweight, and can be kept online without recomputation.

We implemented a prototype of this idea in the cloud data warehouse Amazon Redshift. Our evaluation shows that predicate caching improves query runtimes by up to 10x on selected queries with negligible build overhead.

Predicate caching: Query-driven secondary indexing for cloud data warehouses

Latest news

Work with us