Fast estimation of genetic correlation for Biobank-scale data

dc.contributor.authorWu, Yue
dc.contributor.authorYaschenko, Anna
dc.contributor.authorHeydary, Mohammadreza Hajy
dc.contributor.authorSankararaman, Sriram
dc.date.accessioned2019-09-24T14:38:56Z
dc.date.available2019-09-24T14:38:56Z
dc.date.issued2019-01-20
dc.description.abstractGenetic correlation, i.e., the proportion of phenotypic correlation across a pair of traits that can be explained by genetic variation, is an important parameter in efforts to understand the relationships among complex traits. The observation of substantial genetic correlation across a pair of traits, can provide insights into shared genetic pathways as well as providing a starting point to investigate causal relationships. Attempts to estimate genetic correlations among complex phenotypes attributable to genome-wide SNP variation data have motivated the analysis of large datasets as well as the development of sophisticated methods. Bi-variate Linear Mixed Models (LMMs) have emerged as a key tool to estimate genetic correlation from datasets where individual genotypes and traits are measured. The bi-variate LMM jointly models the effect sizes of a given SNP on each of the pair of traits being analyzed. The parameters of the bi-variate LMM, i.e., the variance components, are related to the heritability of each trait as well as correlation across traits attributable to genotyped SNPs. However, inference in bi-variate LMMs, typically achieved by maximizing the likelihood, poses serious computational challenges. We propose, RG-Cor, a scalable randomized Method-of-Moments (MoM) estimator of genetic correlations in bi-variate LMMs. RG-Cor leverages the structure of genotype data to obtain runtimes that scale sub-linearly with the number of individuals in the input dataset (assuming the number of SNPs is held constant). We perform extensive simulations to validate the accuracy and scalability of RG-Cor. RG-Cor can compute the genetic correlations on the UK biobank dataset consisting of 430, 000 individuals and 460, 000 SNPs in 3 hours on a stand-alone compute machine.en_US
dc.description.sponsorshipThis research was conducted using the UK Biobank Resource under applications 33127. We thank the participants of UK Biobank for making this work possible. SS was supported in part by is supported in part by NIH grants R35GM125055, NSF Grant III-1705121, an Alfred P. Sloan Research Fellowship, and a gift from the Okawa Foundation.en_US
dc.description.urihttps://www.biorxiv.org/content/10.1101/525055v1en_US
dc.format.extent12 pagesen_US
dc.genrejournal article preprintsen_US
dc.identifierdoi:10.13016/m2yakr-ftyg
dc.identifier.citationYue Wu, et.al, Fast estimation of genetic correlation for Biobank-scale data, 2019, doi: https://doi.org/10.1101/525055en_US
dc.identifier.urihttps://doi.org/10.1101/525055
dc.identifier.urihttp://hdl.handle.net/11603/14595
dc.language.isoen_USen_US
dc.publisherCold Spring Harbor Laboratoryen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectgenetic correlationen_US
dc.subjectbiobank-scale dataen_US
dc.subjectBi-variate Linear Mixed Models (LMMs)en_US
dc.subjectRG-Coren_US
dc.titleFast estimation of genetic correlation for Biobank-scale dataen_US
dc.typeTexten_US

Files

License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: