CMU-CS-24-138 Computer Science Department School of Computer Science, Carnegie Mellon University
Optimizing Machine Learning Inference and Trevor Leong M.S. Thesis August 2024
FHE (Fully Homomorphic Encryption) enables computation over encrypted data without revealing plaintext inputs. This property allows clients to outsource computation to servers without revealing their inputs. A notable application of FHE is in Privacy-Preserving Machine Learning as a Service (MLaaS), which enables clients to submit data to a server-hosted machine learning model and receive processed results while maintaining data confidentiality. However, the practical implementation of FHE in evaluating machine learning models remains challenging. The restricted set of operations permissible under FHE presents a significant hurdle to implentation. This is further compounded by the significant performance overhead of each FHE operation compared to its plaintext counterpart. Computing nonlinear functions like softmax requires complex polynomial approximations. Additionally, even FHE-compatible operations like matrix multiplication take considerable time. This thesis addresses the performance and security constraints associated with using FHE to evaluate machine learning models. First, I propose a novel application of a softmax approximation for evaluation in FHE that leads to a 4x reduction in latency. Then, I describe a procedure for evaluating the embedding layer on the server without the client learning the model's embedding matrix, achieving a 5x speedup over the naive approach. Lastly, I optimize the HELR algorithm for an in-house hardware accelerator by modifying rescale and bootstrap placement, significantly reducing the number of bootstraps. 45 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |