CMU-CS-17-102
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-17-102

Scaling Distributed Machine Learning with
System and Algorithm Co-design

Mu Li

February 2017

Ph.D. Thesis

CMU-CS-17-102.pdf


Keywords: Large Scale Machine Learning, Distributed System, Parameter Server, Distributed Optimization Method

Due to the rapid growth of data and the ever increasing model complexity, which often manifests itself in the large number of model parameters, today, many important machine learning problems cannot be efficiently solved by a single machine. Distributed optimization and inference is becoming more and more inevitable for solving large scale machine learning problems in both academia and industry. However, obtaining an efficient distributed implementation of an algorithm, is far from trivial. Both intensive computational workloads and the volume of data communication demand careful design of distributed computation systems and distributed machine learning algorithms. In this thesis, we focus on the co-design of distributed computing systems and distributed optimization algorithms that are specialized for large machine learning problems.

In the first part, we propose two distributed computing frameworks: Parameter Server, a distributed machine learning framework that features efficient data communication between the machines; MXNet, a multi-language library that aims to simplify the development of deep neural network algorithms. We have witnessed the wide adoption of the two proposed systems in the past two years. They have enabled and will continue to enable more people to harness the power of distributed computing to design efficient large-scale machine learning applications.

In the second part, we examine a number of distributed optimization problems in machine learning, leveraging the two computing platforms. We present new methods to accelerate the training process, such as data partitioning with better locality properties, communication friendly optimization methods, and more compact statistical models. We implement the new algorithms on the two systems and test on large scale real data sets. We successfully demonstrate that careful co-design of computing systems and learning algorithms can greatly accelerate large scale distributed machine learning.

178 pages

Thesis Committee:
David G. Andersen (Co-chair)
Jeffrey Dean (Google)
Barnabás Póczos
Ruslan Salakhutdinov
Alexander J. Smola (Co-chair)

Frank Pfenning, Head, Computer Science Department
Andrew W. Moore, Dean, School of Computer Science




Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]