|
CMU-CS-00-145
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-00-145
Active Disk Architecture for Databases
Erik Riedel*, Christos Faloutsos, David F. Nagle
April 2000
CMU-CS-00-145.ps
CMU-CS-00-145.pdf
Keywords: Input/output devices, database application, special-purpose
and application-based systems, input-output and data communications
Today's commodity disk drives, the basic unit of storage for computer
systems large and small, are actually small computers, with a processor,
memory and a network connection, in addition to the spinning magnetic
material that stores the data. Large collections of data are becoming
larger, and people are beginning to analyze, rather than simply
store-and-forget, these masses of data. At the same time, advances in I/O
performance have lagged the rapid development of commodity processor and
memory technology. This paper describes the use of Active Disks to take
advantage of the processing power on individual disk drives to run a
carefully chosen portion of a relational database system. Moving a portion
of the database processing to execute directly at the disk drives improves
performance by: 1) dramatically reducing data traffic; and 2) exploiting
the parallelism in large storage systems. It provides a new point of
leverage to overcome the I/O bottleneck. This paper discusses how to map
all the basic database operations - select, project, and join - onto an
Active Disk system. The changes required are small and the performance
gains are dramatic. A prototype based on the Postgres database system
demonstrates a factor of 2x performance improvement on a small system using
a portion of the TPC-D decision support benchmark, with the promise
of larger improvements in more realistically-sized systems.
22 pages
*Now with Hewlett-Packard Labs, Palo Alto, California, [email protected]
|