False Discovery Rate Controlling Procedures with BLOSUM62 substitution matrix and their application to HIV Data

dc.contributor.authorKim, Kyurhi
dc.contributor.authorPark, Junyong
dc.contributor.authorPark, Dohwan
dc.contributor.authorGiraldo, Mileiy
dc.contributor.authorAldunate, Muriel
dc.contributor.authorSpouge, John L.
dc.contributor.authorTachedjian, Gilda
dc.date.accessioned2023-12-12T17:06:43Z
dc.date.available2023-12-12T17:06:43Z
dc.date.issued2023-11-25
dc.description.abstractIdentifying significant sites in sequence data and analogous data is of fundamental importance in many biological fields. Fisher's exact test is a popular technique, however this approach to sparse count data is not appropriate due to conservative decisions. Since count data in HIV data are typically very sparse, it is crucial to use additional information to statistical models to improve testing power. In order to develop new approaches to incorporate biological information in the false discovery controlling procedure, we propose two models: one based on the empirical Bayes model under independence of amino acids and the other uses pairwise associations of amino acids based on Markov random field with on the BLOSUM62 substitution matrix. We apply the proposed methods to HIV data and identify significant sites incorporating BLOSUM62 matrix while the traditional method based on Fisher's test does not discover any site. These newly developed methods have the potential to handle many biological problems in the studies of vaccine and drug trials and phenotype studies.
dc.description.sponsorshipThis work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C1A01100526).
dc.description.urihttps://arxiv.org/abs/2311.15012
dc.format.extent27 pages
dc.genrejournal articles
dc.genrepreprints
dc.identifier.urihttps://doi.org/10.48550/arXiv.2311.15012
dc.identifier.urihttp://hdl.handle.net/11603/31052
dc.language.isoen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Mathematics Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis work was written as part of one of the author's official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. Law.
dc.rightsPublic Domain en
dc.rights.urihttps://creativecommons.org/publicdomain/mark/1.0/
dc.titleFalse Discovery Rate Controlling Procedures with BLOSUM62 substitution matrix and their application to HIV Data
dc.typeText

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2311.15012.pdf
Size:
647.66 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: