PRESERVING DIFFERENTIAL PRIVACY IN COMPLEX DATA ANALYSIS
1 online resource (181 pages) : PDF
University of North Carolina at Charlotte
Omnipresent databases from various resources, such as social networks, electronic commercial websites, and health related wearable devices, have provided researchers with unprecedented opportunities to analyze complex social phenomena. While society would like to encourage such scientific endeavors, we are faced with the problem of providing researchers with a fairly precise picture of the quantities or trends of complex data without disclosing sensitive information about individuals. In this dissertation, we investigate how to apply differential privacy model in complex data analysis. Differential privacy is a paradigm of post-processing the output of queries or mining tasks on databases such that the inclusion or exclusion of a single individual from a database makes no statistical difference to the results found. It provides formal privacy guarantees that do not depend on an adversary's background knowledge. There has been extensive research on how to enforce differential privacy in analyzing tabular data and several mechanisms have been developed to achieve differential privacy protection. However, there are significant challenges to achieve differential privacy protection on complex data including social networks and biological sequence data mainly due to high sensitivity of desired statistics and the complexity of mining tasks. In this dissertation, we focus on how to enable accurate analysis of complex data while preserving differential privacy. We firstly propose a general divide and conquer framework to deal with complex computation tasks by decomposing a complex target computation into several less complex unit computations connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), and perturbing the output of each unit with Laplace noise derived from its own sensitivity value and the distributed privacy threshold. Next, we develop solutions to more complicated applications: differential privacy graph generation and differential privacy preserving spectral analysis of network topology. We examine the state-of-the-art differential privacy preserving mechanisms including the exponential mechanism and the smooth sensitivity and develop feasible solutions to these problems. Additionally, we consider the potential information disclosure from differential privacy preserving outputs. We propose two attacking models to show how genome-wide association studies (GWAS) results can be used to infer the trait or the identity of individuals even if those results are under differential privacy protection. We also provide the countermeasure for model inversion attacks where the released regression model under the differential privacy protection can still be exploited by the attacker to derive information about sensitive attributes used in the model. We develop a novel approach for releasing differential private regression models by leveraging the functional mechanism to perturb coefficients of the polynomial representation of the objective function while balancing the privacy budgets for sensitive and non-sensitive attributes in learning the regression models. Our approach can effectively retain the models' utility while preventing model inversion attacks. Finally, we consider the problem of enforcing differential privacy at the client-side against an untrusted server in the data collection scenario. Our proposed technique which uses the randomized response technique incurs less utility loss than the traditional output perturbation mechanism especially when the sensitivity of desired computation is high, and also provides the individuals a simple manner to protect their sensitive information by themselves against anyone with ulterior motives.
Ge, YongRas, ZbigniewYan, ShanZheng, Yuliang
Thesis (Ph.D.)--University of North Carolina at Charlotte, 2015.
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Copyright is held by the author unless otherwise indicated.