Fast estimation of genetic correlation for Biobank-scale data

Wu, Yue; Yaschenko, Anna; Heydary, Mohammadreza Hajy; Sankararaman, Sriram

Fast estimation of genetic correlation for Biobank-scale data

dc.contributor.author	Wu, Yue
dc.contributor.author	Yaschenko, Anna
dc.contributor.author	Heydary, Mohammadreza Hajy
dc.contributor.author	Sankararaman, Sriram
dc.date.accessioned	2019-09-24T14:38:56Z
dc.date.available	2019-09-24T14:38:56Z
dc.date.issued	2019-01-20
dc.description.abstract	Genetic correlation, i.e., the proportion of phenotypic correlation across a pair of traits that can be explained by genetic variation, is an important parameter in efforts to understand the relationships among complex traits. The observation of substantial genetic correlation across a pair of traits, can provide insights into shared genetic pathways as well as providing a starting point to investigate causal relationships. Attempts to estimate genetic correlations among complex phenotypes attributable to genome-wide SNP variation data have motivated the analysis of large datasets as well as the development of sophisticated methods. Bi-variate Linear Mixed Models (LMMs) have emerged as a key tool to estimate genetic correlation from datasets where individual genotypes and traits are measured. The bi-variate LMM jointly models the effect sizes of a given SNP on each of the pair of traits being analyzed. The parameters of the bi-variate LMM, i.e., the variance components, are related to the heritability of each trait as well as correlation across traits attributable to genotyped SNPs. However, inference in bi-variate LMMs, typically achieved by maximizing the likelihood, poses serious computational challenges. We propose, RG-Cor, a scalable randomized Method-of-Moments (MoM) estimator of genetic correlations in bi-variate LMMs. RG-Cor leverages the structure of genotype data to obtain runtimes that scale sub-linearly with the number of individuals in the input dataset (assuming the number of SNPs is held constant). We perform extensive simulations to validate the accuracy and scalability of RG-Cor. RG-Cor can compute the genetic correlations on the UK biobank dataset consisting of 430, 000 individuals and 460, 000 SNPs in 3 hours on a stand-alone compute machine.	en_US
dc.description.sponsorship	This research was conducted using the UK Biobank Resource under applications 33127. We thank the participants of UK Biobank for making this work possible. SS was supported in part by is supported in part by NIH grants R35GM125055, NSF Grant III-1705121, an Alfred P. Sloan Research Fellowship, and a gift from the Okawa Foundation.	en_US
dc.description.uri	https://www.biorxiv.org/content/10.1101/525055v1	en_US
dc.format.extent	12 pages	en_US
dc.genre	journal article preprints	en_US
dc.identifier	doi:10.13016/m2yakr-ftyg
dc.identifier.citation	Yue Wu, et.al, Fast estimation of genetic correlation for Biobank-scale data, 2019, doi: https://doi.org/10.1101/525055	en_US
dc.identifier.uri	https://doi.org/10.1101/525055
dc.identifier.uri	http://hdl.handle.net/11603/14595
dc.language.iso	en_US	en_US
dc.publisher	Cold Spring Harbor Laboratory	en_US
dc.relation.isAvailableAt	The University of Maryland, Baltimore County (UMBC)
dc.relation.ispartof	UMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartof	UMBC Faculty Collection
dc.rights	This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subject	genetic correlation	en_US
dc.subject	biobank-scale data	en_US
dc.subject	Bi-variate Linear Mixed Models (LMMs)	en_US
dc.subject	RG-Cor	en_US
dc.title	Fast estimation of genetic correlation for Biobank-scale data	en_US
dc.type	Text	en_US

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.56 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

UMBC Computer Science and Electrical Engineering Department
UMBC Faculty Collection