CMU-CS-05-185
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-05-185

Performance Modeling of Storage Devices
using Machine Learning

Mengzhi Wang

September 2005

Ph.D. Thesis

CMU-CS-05-185.ps.gz
CMU-CS-05-185.pdf


Keywords: Machine learning, learning-based performance models, storage devices, automation of model construction


Performance models of storage devices make it possible to evaluate storage resource congurations efficiently, allowing systems to search automatically a large number of candidates before locating an optimal or near-optimal one. This thesis explores the feasibility of using machine learning techniques to build such performance models. The models are constructed through "training", during which the model construction algorithm observes storage devices under a set of training traces and builds the models based on the observations. The main advantage of the approach is the automation of the model construction algorithm, in addition to the high efficiency in both computation and storage.

In our design, the models represent an I/O workload as vectors, and model its performance on storage devices as functions over the vectors using a regression tool. We have identied that vector representation of workloads, the regression tool, and training traces are three important factors in model quality. This thesis provides a thorough evaluation of existing techniques in addressing these issues. In addition, we have proposed the entropy plot to characterize the spatio-temporal behavior of I/O workloads and the PQRS model to generate traces of given characteristics to augment existing work in workload characterization.

Our experiments on real-world traces have shown that the learning-based models are fast and accurate when the training and testing traces are similar. Oine training using synthetic traces, however, is less effective because the synthetic trace generators fail to capture the strong correlations between requests. Our error analyses have shown both the vector representation and synthetic trace generators have space for further improvement.

180 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by [email protected]