|
CMU-CS-03-157
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-03-157
Improving Hash Join Performance through Prefetching
Shimin Chen, Anastassia Ailamaki,
Phillip B. Gibbons*, Todd C. Mowry
October 2003
CMU-CS-03-157.ps
CMU-CS-03-157.pdf
Keywords: Cache performance, databases, hash join, group
prefetching, software-pipelined prefetching
Hash join algorithms suffer from extensive CPU cache stalls.
This paper shows that the standard hash join algorithm for
disk-oriented databases (i.e. GRACE) spends over 73% of its
user time stalled on CPU cache misses, and explores the use
of prefetching to improve its cache performance. Applying
prefetching to hash joins is complicated by the data dependencies,
multiple code paths, and inherent randomness of hashing. We
present two techniques, group prefetching and software-pipelined
prefetching, that overcome these complications. These schemes
achieve 2.0R2.9X speedups for the join phase and 1.4R2.6X
speedups for the partition phase over GRACE and simple prefetching
approaches. Compared with previous cache-aware approaches (i.e.
cache partitioning), the schemes are at least 50% faster on large
relations and do not require exclusive use of the CPU cache to
be effective.
22 pages
*Intel Research Pittsburgh, 417 South Craig Street, Pittsburgh, PA 15213
|