Randomization based privacy preserving categorical data analysis
1 online resource (127 pages) : PDF
University of North Carolina at Charlotte
This dissertation investigates data utility and privacy of randomization-based models in privacy preserving data mining for categorical data. For the analysis of data utility in randomization model, we first investigate the accuracy analysis for association rule mining in market basket data. Then we propose a general framework to conduct theoretical analysis on how the randomization process affects the accuracy of various measures adopted in categorical data analysis. We also examine data utility when randomization mechanisms are not provided to data miners to achieve better privacy. We investigate how various objective association measures between two variables may be affected by randomization. We then extend it to multiple variables by examining the feasibility of hierarchical loglinear modeling. Our results provide a reference to data miners about what they can do and what they can not do with certainty upon randomized data directly without the knowledge about the original distribution of data and distortion information. Data privacy and data utility are commonly considered as a pair of conflicting requirements in privacy preserving data mining applications. In this dissertation, we investigate privacy issues in randomization models. In particular, we focus on the attribute disclosure under linking attack in data publishing. We propose efficient solutions to determine optimal distortion parameters such that we can maximize utility preservation while still satisfying privacy requirements. We compare our randomization approach with l-diversity and anatomy in terms of utility preservation (under thesame privacy requirements) from three aspects (reconstructed distributions, accuracy of answering queries, and preservation of correlations). Our empirical results show that randomization incurs significantly smaller utility loss.
Thesis (Ph.D.)--University of North Carolina at Charlotte, 2010.
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Copyright is held by the author unless otherwise indicated.