Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program

dc.contributor.authorSlater, Noa
dc.contributor.authorLouzoun, Yoram
dc.contributor.authorGragert, Loren
dc.contributor.authorMaiers, Martin
dc.contributor.authorChatterjee, Snigdhansu
dc.contributor.authorAlbrecht, Mark
dc.date.accessioned2026-03-05T19:35:48Z
dc.date.issued2015-04-22
dc.description.abstractMeasures of allele and haplotype diversity, which are fundamental properties in population genetics, often follow heavy tailed distributions. These measures are of particular interest in the field of hematopoietic stem cell transplant (HSCT). Donor/Recipient suitability for HSCT is determined by Human Leukocyte Antigen (HLA) similarity. Match predictions rely upon a precise description of HLA diversity, yet classical estimates are inaccurate given the heavy-tailed nature of the distribution. This directly affects HSCT matching and diversity measures in broader fields such as species richness. We, therefore, have developed a power-law based estimator to measure allele and haplotype diversity that accommodates heavy tails using the concepts of regular variation and occupancy distributions. Application of our estimator to 6.59 million donors in the Be The Match Registry revealed that haplotypes follow a heavy tail distribution across all ethnicities: for example, 44.65% of the European American haplotypes are represented by only 1 individual. Indeed, our discovery rate of all U.S. European American haplotypes is estimated at 23.45% based upon sampling 3.97% of the population, leaving a large number of unobserved haplotypes. Population coverage, however, is much higher at 99.4% given that 90% of European Americans carry one of the 4.5% most frequent haplotypes. Alleles were found to be less diverse suggesting the current registry represents most alleles in the population. Thus, for HSCT registries, haplotype discovery will remain high with continued recruitment to a very deep level of sampling, but population coverage will not. Finally, we compared the convergence of our power-law versus classical diversity estimators such as Capture recapture, Chao, ACE and Jackknife methods. When fit to the haplotype data, our estimator displayed favorable properties in terms of convergence (with respect to sampling depth) and accuracy (with respect to diversity estimates). This suggests that power-law based estimators offer a valid alternative to classical diversity estimators and may have broad applicability in the field of population genetics.
dc.description.urihttps://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004204
dc.format.extent21 pages
dc.genrejournal articles
dc.identifier.citationSlater, Noa, Yoram Louzoun, Loren Gragert, Martin Maiers, Ansu Chatterjee, and Mark Albrecht. “Power Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program.” PLOS Computational Biology 11, no. 4 (April 22, 2015): e1004204. https://doi.org/10.1371/journal.pcbi.1004204.
dc.identifier.urihttps://doi.org/10.1371/journal.pcbi.1004204
dc.identifier.urihttp://hdl.handle.net/11603/42015
dc.language.isoen
dc.publisherPLOS
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Mathematics and Statistics Department
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/deed.en
dc.subjectNative American people
dc.subjectHematopoietic stem cell transplantation
dc.subjectHaplotypes
dc.subjectSpecies diversity
dc.subjectPopulation genetics
dc.subjectEurope
dc.subjectHispanic people
dc.subjectAfrican American people
dc.titlePower Laws for Heavy-Tailed Distributions: Modeling Allele and Haplotype Diversity for the National Marrow Donor Program
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-7986-0470

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
filePowerLawsforHeavyTailedDistributions.pdf
Size:
963.41 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
S4PowerLawsforHeavyTailedDistributionsZIP.zip
Size:
7.47 MB
Format:
Unknown data format