VISUAL ANALYTICS IN HIGH-DIMENSIONAL DATA WITH DICHOTOMOUS OUTCOME
1 online resource (162 pages) : PDF
University of North Carolina at Charlotte
High-dimensional data becomes common in application areas such as environmental studies and healthcare. The high dimensionality presents opportunities for understanding how certain outcomes happen by identifying significant variables contributing to the outcomes. Many efforts have been made to address this task. However, automated data analysis techniques often suffer from the ``curse of dimensionality'' and the difficulty of result interpretations. To integrate human intelligence into the analysis process and facilitate information communication with users, high-dimensional data visualization techniques have been developed. Unfortunately, high-dimensional data often leads to a cluttered visual display that obscures pattern discovery and hinders understanding of the data. Whereas a few visual analytics approaches have been developed to bridge automated data analysis and interactive visualization for high-dimensional data, few existing works have been focused on finding explanatory relationships between variables and outcomes.In this dissertation, we address the task with two distinct paths from high-dimensional data with dichotomous outcomes to knowledge. First, we use visualizations to facilitate logit model building. We propose two approaches. In the first approach, Parallel Coordinates is used to facilitate dimension reduction based on correlation analysis, the first step of logit model building. It addresses the difficulties of correlation comparison and exploration when there are hierarchical outcome variables. In the second approach, a visual analytics pipeline is proposed for logit modeling. It leverages the traditional modeling pipeline by providing (1) intuitive visualizations for inspecting statistical indicators and the relationships among the variables and (2) a seamless, effective dimension reduction pipeline for selecting variables for inclusion in high quality logistic regression models.Second, we enhance visualizations with automated data analysis. In particular, association rule mining is employed to enhance Parallel Sets for categorical data exploration. Dimension reduction and reordering are conducted to reduce clutters and facilitate visual explorations in Parallel Sets based on significant association rules. The effectiveness and efficiency of our approaches are illustrated by a set of case studies and experiments with benchmark datasets.
ASSOCIATION RULE MININGDICHOTOMOUSHIGH-DIMENSIONALREGRESSIONVISUAL ANALYTICSVISUALIZATION
Ras, ZbigniewWartell, ZacharyChu, Bill
Thesis (Ph.D.)--University of North Carolina at Charlotte, 2017.
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Copyright is held by the author unless otherwise indicated.