CMU-CS-24-114 Computer Science Department School of Computer Science, Carnegie Mellon University
Extraction of Training Data from Mihir Dhamankar M.S. Thesis April 2024
Large Language Models have been shown to perform well on natural language tasks, even those they were not explicitly trained to perform. Fine-tuning these models on smaller datasets has become a popular technique to achieve high performance on specific tasks. However, fine-tuning can lead to the memorization of training data, which may be a privacy concern. In this work, I investigated the extraction of training data from fine-tuned large language models. I conducted a series of experiments to determine how easily private training data can be extracted from fine-tuned models using different data extraction techniques. I also investigated how the amount of training data used for fine-tuning, the number of epochs, the length and content of each training sample, and the fine-tuning technique and parameters used affect the ease of data extraction. I found that data extraction is straightforward with direct access to the model if training loss is calculated over the entire prompt. Otherwise, some information about training data can still be gained by comparing output probability scores of many requests to the model. I also found that the proportion of data that can be extracted increased with the amount of data used for fine-tuning (for a constant number of epochs). This work has implications for the privacy of individuals whose data is used for fine-tuning, as well as for businesses or groups that use fine-tuned models in public facing software. 42 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |