CMU-CS-20-143 Computer Science Department School of Computer Science, Carnegie Mellon University
Filter representation in Vectorized Query Execution Amadou Latyr Ngom M.S. Thesis July 2020
Advances in memory capacity have allowed Database Management Systems (DBMSs) to store large amounts of data in memory, there by shifting the performance bottleneck of query execution from disk accesses to CPU efficiency (i.e., instruction count and cycles per instruction). One technique used to achieve such efficiency in analytical applications is batch-oriented processing or vectorization: it reduces interpretation overhead, improves cache locality, and allows for efficient loop optimizations (e.g., loop unrolling, SIMD vectorization). For each vector (i.e., a batch of tuples),the DBMS needs to track the set of valid tuples after a predicate application. To that end, systems employ one of two data structures, or filter representations: Selection Vectors (SelVecs) or Bitmaps. In this work, we analyze each approach's strengths and weaknesses to provide recommendations on how to implement vectorized operations. Through a wide range of micro-benchmarks, we determine that the optimal implementation strategy is a function of many factors: the cost of iterating through tuples, the cost of the operation itself, and how amenable it is to SIMD vectorization. Our analysis shows that Bitmaps perform better for operations that can be efficiently vectorized using SIMD instructions but that SelVecsper form better on all other operations due to a cheaper iteration logintc.
54 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |