DE NOVO PREDICTION OF CIS-REGULATORY MODULES IN EUKARYOTIC ORGANISMS
1 online resource (137 pages) : PDF
University of North Carolina at Charlotte
Gene regulation networks (GRNs) are the bases for virtually all biological processes. To gain a global understanding of GRNs encoded in a genome, we first need to identify in all the cis-regulatory elements (CREs) recognized by transcription factors (TFs). In higher eukaryotes, CREs rarely work alone, instead, they regulate genes by forming combinatorial patterns called cis-regulatory modules (CRMs). Thus finding CREs as well as CRMs is the key to understanding GRNs in eukaryotes. However, identification of CREs and CRMs is a highly challenging task due to their short length and degeneracy while residing in long intergenic or intronic sequences. The recent wide adaptation of chromatin precipitation followed by DNA sequencing (ChIP-seq) techniques has churned out numerous datasets for locating CREs for TFs, providing an unprecedented opportunity to decipher CREs and CRMs in a genome. In this dissertation, we have developed a graph theory based algorithm DePCRM for genome-wide de novo predictions of CRMs and CREs by integrating a large number of ChIP datasets. Using this algorithm, we have predicted 1,108,018 and 5,186,520 CREs, and 115,932 and 807,365 CRMs in the Drosophila melanogaster and human genomes, respectively, using all the ChIP-seq datasets available to us in the two organism. We found that our predicted CRMs could recover more than 80% known CRMs, and that both the putative CREs and CRMs were more conserved than randomly selected sequences in both the genomes. Furthermore, trait-linked SNPs and DNaseI hypersensitive regions are highly enriched in our predicted CRMs in the human genome. Thus, we have provided so far the most comprehensive maps of CREs and CRMs in the two genomes. Using the much larger number of human ChIP datasets, we also analyzed the saturation trends of predicted CRE motifs and their combinatory patterns using an increasing number of randomly selected datasets, datasets in different cell types and datasets for different TFs. We found that the saturation trends started to be notable with only a few datasets in each scenario. The results suggest ways to generate ChIP datasets more cost-effectively in the future. Finally, we analyzed the conservation and variation of the cis-regulatory systems between the two species. We found that although a large portion of CRMs are conserved in their motif composition in the two species, their target genes have been significantly changed. Thus, the majority of the GRNs have been rewired during the evolution from D. melanogaster to humans.
CHIP-SEQCIS-REGULATORY ELEMENTCIS-REGULATORY MODULEDROSOPHILA MELANOGASTEREVOLUTIONHUMAN
Fodor, AnthonyJanies, DanielGuo, Jun-taoSong, Bao-Hua
Thesis (Ph.D.)--University of North Carolina at Charlotte, 2014.
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). For additional information, see http://rightsstatements.org/page/InC/1.0/.
Copyright is held by the author unless otherwise indicated.