CMU-CS-15-100 Computer Science Department School of Computer Science, Carnegie Mellon University
Structured Sparse Models and Algorithms for Genetic Analaysis Seunghak Lee May 2015 Ph.D. Thesis
Currently Unavailable Electronically
Identifying genetic variants (e.g., single nucleotide polymorphisms) associated with phenotypic variations (e.g., disease status) is a fundamental problem in genetics. However, most genetic variants associated with complex phenotypes remain elusive. A major challenge is that the number of samples is much smaller than the number of genetic variants, and thus the statistical power to detect phenotype-associated genetic variants is limited. In this thesis, to enhance the statistical power, we develop structured sparse models and algorithms to detect genotype-phenotype associations, taking advantage of biological knowledge or structures in the data or problems. We first develop structured sparse models and algorithms, which include adaptive multi-task lasso and structured input-output lasso, that take advantage of genome annotations or group structures in genomes and phenotypic traits. We then develop a sparse piecewise linear model to detect trait-associated interactions between genetic variants, which considers non-linear structures of the problem. To enable the analysis of large-scale human data, we scale up algorithms for structured sparse models. Specifically, we develop a screening algorithm for overlapping group lasso (i.e., a general form of structured sparse models) that allows us to safely discard irrelevant genetic variants using simple rules. This makes it feasible to solve large structured sparse model problems because the screening algorithm can dramatically reduce the candidate genetic variants prior to solving the original problems. Finally, using the aforementioned models and algorithms, we present a method that integrates genotypic, gene expression, and phenotypic data to detect phenotype-associated genetic variants while unveiling their association mechanisms. Using the integrative approach, we analyze large-scale Alzheimer's disease data and identify genetic variants and genes associated with Alzheimer's disease status. As examples, we investigate the mechanisms of some associations involved in beta-amyloid, estrogen, and nicotine pathways.
pages
Frank Pfenning, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |