CMU-CS-14-105
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-14-105

A Low-Power Hybrid CPU-GPU Sort

Lawrence Tan

March 2014

M.S. Thesis

CMU-CS-14-105.pdf


Keywords: CUDA, low power, Joulesort

This thesis analyses the energy efficiency of a low-power CPU-GPU hybrid architecture. We evaluate the NVIDIA Ion architecture, which couples an Intel Atom low power processor with an integrated GPU that has an order of magnitude fewer processors compared to traditional discrete GPUs. We attempt to create a system that balances computation and I/O capabilities by attaching flash storage that allows sequential access to data with very high throughput.

To evaluate this architecture, we implemented a Joulesort candidate that can sort in excess of 18000 records per Joule. We discuss the techniques used to ensure that the work is distributed between the CPU and the GPU so as to fully utilize system resources. We also analyse the different components in this system and attempt to identify the bottlenecks, which will help guide future work using such an architecture.

We conclude that a balanced architecture with sufficient I/O to saturate available compute capacity is significantly more energy efficient compared to traditional machines. We also find that the CPU-GPU hybrid sort is marginally more efficient than a CPU-only sort. However, due to the limited I/O capacity of our evaluation platform, further work is required to determine the extent of the advantage the hybrid sort has over the CPU-only sort.

35 pages



Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]