VISUAL ANALYTICS IN HIGH-DIMENSIONAL DATA WITH DICHOTOMOUS OUTCOME

Zhang, Chong

VISUAL ANALYTICS IN HIGH-DIMENSIONAL DATA WITH DICHOTOMOUS OUTCOME

Search for this publication on Google Scholar

Zhang, C. (2017). VISUAL ANALYTICS IN HIGH-DIMENSIONAL DATA WITH DICHOTOMOUS OUTCOME. Unc Charlotte Electronic Theses And Dissertations.

Download PDF

Analytics

82 views ◎
23 downloads ⇓

Abstract

High-dimensional data becomes common in application areas such as environmental studies and healthcare. The high dimensionality presents opportunities for understanding how certain outcomes happen by identifying significant variables contributing to the outcomes. Many efforts have been made to address this task. However, automated data analysis techniques often suffer from the ``curse of dimensionality'' and the difficulty of result interpretations. To integrate human intelligence into the analysis process and facilitate information communication with users, high-dimensional data visualization techniques have been developed. Unfortunately, high-dimensional data often leads to a cluttered visual display that obscures pattern discovery and hinders understanding of the data. Whereas a few visual analytics approaches have been developed to bridge automated data analysis and interactive visualization for high-dimensional data, few existing works have been focused on finding explanatory relationships between variables and outcomes.In this dissertation, we address the task with two distinct paths from high-dimensional data with dichotomous outcomes to knowledge. First, we use visualizations to facilitate logit model building. We propose two approaches. In the first approach, Parallel Coordinates is used to facilitate dimension reduction based on correlation analysis, the first step of logit model building. It addresses the difficulties of correlation comparison and exploration when there are hierarchical outcome variables. In the second approach, a visual analytics pipeline is proposed for logit modeling. It leverages the traditional modeling pipeline by providing (1) intuitive visualizations for inspecting statistical indicators and the relationships among the variables and (2) a seamless, effective dimension reduction pipeline for selecting variables for inclusion in high quality logistic regression models.Second, we enhance visualizations with automated data analysis. In particular, association rule mining is employed to enhance Parallel Sets for categorical data exploration. Dimension reduction and reordering are conducted to reduce clutters and facilitate visual explorations in Parallel Sets based on significant association rules. The effectiveness and efficiency of our approaches are illustrated by a set of case studies and experiments with benchmark datasets.

Details

Author: Zhang, Chong
Title: VISUAL ANALYTICS IN HIGH-DIMENSIONAL DATA WITH DICHOTOMOUS OUTCOME
Physical Description: 1 online resource (162 pages) : PDF
Date: 2017
Degree Granting Institution: University of North Carolina at Charlotte
Abstract: High-dimensional data becomes common in application areas such as environmental studies and healthcare. The high dimensionality presents opportunities for understanding how certain outcomes happen by identifying significant variables contributing to the outcomes. Many efforts have been made to address this task. However, automated data analysis techniques often suffer from the ``curse of dimensionality'' and the difficulty of result interpretations. To integrate human intelligence into the analysis process and facilitate information communication with users, high-dimensional data visualization techniques have been developed. Unfortunately, high-dimensional data often leads to a cluttered visual display that obscures pattern discovery and hinders understanding of the data. Whereas a few visual analytics approaches have been developed to bridge automated data analysis and interactive visualization for high-dimensional data, few existing works have been focused on finding explanatory relationships between variables and outcomes.In this dissertation, we address the task with two distinct paths from high-dimensional data with dichotomous outcomes to knowledge. First, we use visualizations to facilitate logit model building. We propose two approaches. In the first approach, Parallel Coordinates is used to facilitate dimension reduction based on correlation analysis, the first step of logit model building. It addresses the difficulties of correlation comparison and exploration when there are hierarchical outcome variables. In the second approach, a visual analytics pipeline is proposed for logit modeling. It leverages the traditional modeling pipeline by providing (1) intuitive visualizations for inspecting statistical indicators and the relationships among the variables and (2) a seamless, effective dimension reduction pipeline for selecting variables for inclusion in high quality logistic regression models.Second, we enhance visualizations with automated data analysis. In particular, association rule mining is employed to enhance Parallel Sets for categorical data exploration. Dimension reduction and reordering are conducted to reduce clutters and facilitate visual explorations in Parallel Sets based on significant association rules. The effectiveness and efficiency of our approaches are illustrated by a set of case studies and experiments with benchmark datasets.
Genre: doctoral dissertations
Subjects--Topics: Computer science
Degree: Ph.D.
Keywords: Association Rule Mining
Dichotomous
High-Dimensional
Regression
Visual Analytics
Visualization
Subject Area: Computer Science
Advisor(s): Yang, Jing
Committee Members: Ras, Zbigniew
Wartell, Zachary
Chu, Bill
Degree Note: Thesis (Ph.D.)--University of North Carolina at Charlotte, 2017.
Rights Statement: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Rights Holder Information: Copyright is held by the author unless otherwise indicated.
Identifier: Zhang_uncc_0694D_11466
Permalink: http://hdl.handle.net/20.500.13093/etd:159

J. Murrey Atkins Library

J. Murrey Atkins Library