CMU-CS-13-114
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-13-114

Improving Device Driver Reliability through
Decoupled Dynamic Binary Analyses

Olatunji O. Ruwase

May 2013

Ph.D. Thesis

CMU-CS-13-114.pdf


Keywords: Device Drivers, I/O, Operating Systems, Virtualization, Dynamic Analysis, Reliability, Computer Architecture

Device drivers are Operating Systems (OS) extensions that enable the use of I/O devices in computing systems. However, studies have identified drivers as an Achilles' heel of system reliability, their high fault rate accounting for a significant portion of system failures. Consequently, significant effort has been directed towards improving system robustness by protecting system components (e.g., OS kernel, I/O devices, etc.) from the harmful effects of driver faults. In contrast to prior techniques which focused on preventing unsafe driver interactions (e.g., with the OS kernel), my thesis is that checking a driver’s execution for correctness violations results in the detection and mitigation of more faults.

To validate this thesis, I present Guardrail, a flexible and powerful framework that enables instruction-grained dynamic analysis (e.g., data race detection) of unmodified kernel-mode driver binaries to safeguard I/O operations and devices from driver faults. Guardrail decouples the analysis tool from driver execution to improve performance, and runs it in user-space to simplify the deployment of new tools. Moreover, Guardrail leverages virtualization to be transparent to both the driver and device, and enable support for arbitrary driver/device combinations.

To demonstrate Guardrail's generality, I implemented three novel dynamic checking tools within the framework for detecting memory faults, data races and DMA faults in drivers. These tools found 25 serious bugs, including previously unknown bugs, in Linux storage and network drivers. Some of the bugs existed in several Linux (and driver) releases, suggesting their elusiveness to existing approaches. Guardrail easily detected these bugs using common driver workloads.

Finally, I present an evaluation of using Guardrail to protect network and storage I/O operations from memory faults, data races and DMA faults in drivers. The results show that with hardware-assisted logging for decoupling the heavyweight analyses from driver execution, standard I/O workloads generally experienced negligible slowdown on their end-to-end performance.

In conclusion, Guardrail's high fidelity fault detection and efficient monitoring performance makes it a promising approach for improving the resilience of computing systems to the wide variety of driver faults.

133 pages



Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]