CMU-CS-24-123
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-24-123

Communication-Efficient LLM Training for Federated Learning

Arian Raje

M.S. Thesis

May 2024

CMU-CS-24-123.pdf


Keywords: Federated Learning, Sparsity, Efficiency, LLMs

Federated learning (FL) is a recent model training paradigm in which client de- vices collaboratively train a model without ever aggregating their data. Crucially, this scheme offers potential privacy and security benefits for users by only ever communicating updates to the model weights to a central server as opposed to traditional machine learning (ML) training which directly communicates and aggregates data. However, FL training suffers from statistical heterogeneity as clients may have differing distributions of local data. Large language models (LLMs) offer a potential solution to this issue of heterogeneity given that they have consistently been shown to be able to learn on vast amounts of noisy data. While LLMs are a promising development for resolving the consistent issue of non-I.I.D. clients in federated settings, they exacerbate two other bottlenecks in FL: limited local compute and expensive communication. This thesis aims to develop efficient training methods for LLMs in FL. To this end, we employ two critical techniques in enabling efficient training. First, we use low-rank adaptation (LoRA) to reduce the computational load of local model training. Second, we communicate sparse updates throughout training to significantly cut down on communication costs. Taken together, our method reduces communication costs by up to 10x over vanilla LoRA and up to 5x over more complex sparse LoRA baselines while achieving greater utility. We emphasize the importance of carefully applying sparsity and picking effective rank and sparsity configurations for federated LLM training.

45 pages

Thesis Committee:
Virginia Smith (Chair)
Zhihao Jia
Gauri Joshi

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]