CMU-ISRI-06-105 Institute for Software Research School of Computer Science, Carnegie Mellon University
Trail Re-Identification and Unlinkability Bradley Malin May 2006 Ph.D. Thesis
CMU-ISRI-06-105.ps
In this work, trails are studied in two principle parts. First, we concentrate on the trail re-identification problem and develop several learning algorithms for discovering re-identifications. The algorithms are evaluated on populations derived from real world databases, including hospital visits derived from medical databases and weblogs derived from Internet databases. It is demonstrated that susceptibility to trail re-identification is neither trivial nor the result of bizarre isolated occurrences. Experimental evidence with real world populations confirms that significant quantities of populations are at trail reidentification risk. Second, we propose a protocol by which data holders can collaborate to provably prevent trail re-identification. To do so, we introduce a formal model of privacy called k-unlinkability, and several configurable algorithms to render protected trails. To satisfy real world policy constraints, we present a novel secure multiparty computation protocol that embeds the protection procedure. Using real world datasets, it is demonstrated that significant quantities of data can be disclosed with provable privacy guarantees. 253 pages
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |