CMU-CS-20-126
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-20-126

Theoretical Foundations for Practical
Concurrent and Distributed Computation

Naama Ben-David

Ph.D. Thesis

August 2020

CMU-CS-20-126.pdf


Keywords: Multiprocessor hardware, NUMA, contention, shared memory, NVRAM, RDMA

Many large-scale computations are nowadays computed using several processes, whether on a single multi-core machine, or distributed over many machines. This wide-spread use of concurrent and distributed technology is also driving innovations in their underlying hardware. To design fast and correct algorithms in such settings, it is important to develop a theory of concurrent and distributed computing that is faithful to practice. Unfortunately, it is difficult to abstract practical problems into approachable theoretical ones, leading to theoretical models that are too far removed from reality to be easily applied in practice.

This thesis aims to bridge this gap by building a strong theoretical foundation for practical concurrent and distributed computing. The thesis takes a two-pronged approach toward this goal; improving theoretical analysis of well-studied settings, and modeling new technologies theoretically.

In the first vein, we consider the problem of analyzing shared-memory concurrent algorithms such that the analysis reflects their performance in practice. A new shared memory model that accounts for contention costs is presented, as well as a tool for uncovering the way that a machine interleaves concurrent instructions. We also show that the analysis of concurrent algorithms in more restricted settings can be easy and accurate, by designing and analyzing a concurrent algorithm for nested parallelism.

The second approach explored in this thesis to bridge the theory-practice gap considers and models two emerging technologies. Firstly, we study non-volatile random access memories(NVRAM) and develop a general simulation that can adapt many classic concurrent algorithms to a setting in which memory is non-volatile and can recover after a system fault. Finally, we study remote direct memory access (RDMA) as a tool for replication in data centers. We develop a model that captures the power of RDMA and demonstrate that RDMA supports heightened performance and fault tolerance, thereby uncovering its practical relevance by formally reasoning about it theoretically.

221 pages

Thesis Committee:
Guy Blelloch (Chair)
Umut Acar
Phillip Gibbons
Mor Harchol-Balter
Erez Petrank (Technion | Israel Institute of Technology)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]