CMU-CS-23-120 Computer Science Department School of Computer Science, Carnegie Mellon University
Parallel Batch-Dynamic Algorithms Daniel Anderson Ph.D. Thesis June 2023
The defining feature of many modern large-scale computer systems is the sheer amount of data that they generate and process. Google’s MapReduce clusters process over twenty petabytes of data per day, and Facebook has reported processing over 4 petabytes of data per day and running over one million map reduce computations over this data. Modern systems for processing this kind of data make extensive use of parallel and distributed computation to achieve the necessary level of throughput. An important property of the datasets involved in such computations, however, is that they are not static. Rather, they are frequently evolving. Large social networks, for example, undergo rapid changes; users are added, and links between uses are created and deleted rapidly, but each such update affects a relatively tiny portion of the data. This thesis explores a relatively under-explored area of algorithm design intended to tackle these kinds of problems: parallel batch-dynamic algorithms. Classic algorithms for handling dynamic data support a single update at a time, but this model is insufficient for handling the scale of modern rapidly changing datasets, and furthermore yields little room to exploit parallelism within the updates. A batch-dynamic algorithm consumes multiple updates at a time, which allows for increased throughput and more opportunities for parallelism. In the first part of the thesis, we will design algorithms for parallel batch- dynamic trees based on sequential Rake-Compress Trees (RC Trees). In the second part of this thesis, we will design algorithms for batch-dynamic graph connectivity and batch-incremental minimum spanning trees. Then, in Part Three, we show that batch-dynamic algorithms can be used as ingredients in designing highly-efficient parallel static algorithms. Using our parallel Rake-Compress Tree data structure as an ingredient, we obtain the first ever work-efficient parallel algorithm for minimum cuts. The final part of this thesis will explore the implementation of systems for processing batch-dynamic data. One factor that hampers the adoption of efficient dynamic algorithms in practice is that they are often extremely complicated and difficult to implement. To address this issue in the sequential setting, self-adjusting computation is a technique for automatically dynamizing static algorithms to create dynamic ones. This lowers the implementation burden on programmers. In this thesis, we show that self-adjusting computation can be generalized to automatically dynamize parallel algorithms, allowing programmers to reap the benefits of parallel batch-dynamic algorithms without the burden of implementing them from scratch.
225 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |