Fast estimation of genetic correlation for Biobank-scale data
dc.contributor.author | Wu, Yue | |
dc.contributor.author | Yaschenko, Anna | |
dc.contributor.author | Heydary, Mohammadreza Hajy | |
dc.contributor.author | Sankararaman, Sriram | |
dc.date.accessioned | 2019-06-24T19:37:58Z | |
dc.date.available | 2019-06-24T19:37:58Z | |
dc.date.issued | 2019-01-20 | |
dc.description.abstract | Genetic correlation, i.e., the proportion of phenotypic correlation across a pair of traits that can be explained by genetic variation, is an important parameter in efforts to understand the relationships among complex traits. The observation of substantial genetic correlation across a pair of traits, can provide insights into shared genetic pathways as well as providing a starting point to investigate causal relationships. Attempts to estimate genetic correlations among complex phenotypes attributable to genome-wide SNP variation data have motivated the analysis of large datasets as well as the development of sophisticated methods. Bi-variate Linear Mixed Models (LMMs) have emerged as a key tool to estimate genetic correlation from datasets where individual genotypes and traits are measured. The bi-variate LMM jointly models the effect sizes of a given SNP on each of the pair of traits being analyzed. The parameters of the bi-variate LMM, i.e., the variance components, are related to the heritability of each trait as well as correlation across traits attributable to genotyped SNPs. However, inference in bi-variate LMMs, typically achieved by maximizing the likelihood, poses serious computational challenges. We propose, RG-Cor, a scalable randomized Method-of-Moments (MoM) estimator of genetic correlations in bi-variate LMMs. RG-Cor leverages the structure of genotype data to obtain runtimes that scale sub-linearly with the number of individuals in the input dataset (assuming the number of SNPs is held constant). We perform extensive simulations to validate the accuracy and scalability of RG-Cor. RG-Cor can compute the genetic correlations on the UK biobank dataset consisting of 430, 000 individuals and 460, 000 SNPs in 3 hours on a stand-alone compute machine. | en_US |
dc.description.sponsorship | This research was conducted using the UK Biobank Resource under applications 33127. We thank the participants of UK Biobank for making this work possible. SS was supported in part by is supported in part by NIH grants R35GM125055, NSF Grant III-1705121, an Alfred P. Sloan Research Fellowship, and a gift from the Okawa Foundation. | en_US |
dc.description.uri | https://www.biorxiv.org/content/10.1101/525055v1 | en_US |
dc.format.extent | 12 pages | en_US |
dc.genre | journal article preprints | en_US |
dc.identifier | doi:10.13016/m2bjqv-4d6i | |
dc.identifier.citation | Yue Wu, et.al , Fast estimation of genetic correlation for Biobank-scale data, 2019, DOI: https://doi.org/10.1101/525055 | en_US |
dc.identifier.uri | https://doi.org/10.1101/525055 | |
dc.identifier.uri | http://hdl.handle.net/11603/14299 | |
dc.language.iso | en_US | en_US |
dc.publisher | Cold Spring Harbor Laboratory | en_US |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department Collection | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | |
dc.subject | Method-of-Moments (MoM) | en_US |
dc.subject | Biobank-scale data | en_US |
dc.subject | Linear Mixed Models (LMMs) | en_US |
dc.title | Fast estimation of genetic correlation for Biobank-scale data | en_US |
dc.type | Text | en_US |
Files
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 2.56 KB
- Format:
- Item-specific license agreed upon to submission
- Description: