Identifying Nonlinear Correlations in High Dimensional Data with Application to Protein Molecular Dynamics Simulations
dc.contributor.author | Bailey, William J. | |
dc.contributor.author | Chambless, Claire A. | |
dc.contributor.author | Cho, Brandynne M. | |
dc.contributor.author | Smith, Jesse D. | |
dc.contributor.author | Raim, Andrew M. | |
dc.contributor.author | Adragni, Kofi P. | |
dc.contributor.author | Thorpe, Ian F. | |
dc.date.accessioned | 2018-10-01T13:53:50Z | |
dc.date.available | 2018-10-01T13:53:50Z | |
dc.date.issued | 2013 | |
dc.description.abstract | Complex biomolecules such as proteins can respond to changes in their environment through a process called allostery, which plays an important role in regulating the function of these biomolecules. Allostery occurs when an event at a specific location in a macromolecule produces an effect at a location in the molecule some distance away. An important component of allostery is the coupling of protein sites. Such coupling is one mechanism by which allosteric effects can be transmitted over long distances. To understand this phenomenon, molecular dynamic simulations are carried out with a large number of atoms, and the trajectories of these atoms are recorded over time. Simple correlation methods have been used in the literature to identify coupled motions between protein sites. We implement a recently developed statistical method for dimension reduction called principal fitted components (PFC) in the statistical programming language R to identify both linear and non-linear correlations between protein sites while dealing efficiently with the high dimensionality of the data. PFC models reduce the dimensionality of data while capturing linear and nonlinear dependencies among predictors (atoms) using a flexible set of basis functions. For faster processing, we implement the PFC algorithm using parallel computing through the Programming with Big Data in R (pbdR) package for R. We demonstrate the methods’ effectiveness on simulated datasets, and apply the routine to time series data from Molecular Dynamic (MD) simulations to identify coupled motion among the atoms. | en_US |
dc.description.sponsorship | These results were obtained as part of the REU Site: Interdisciplinary Program in High Performance Computing (www.umbc.edu/hpcreu) in the Department of Mathematics and Statistics at the University of Maryland, Baltimore County (UMBC) in Summer 2013. This program is funded jointly by the National Science Foundation and the National Security Agency (NSF grant no. DMS–1156976), with additional support from UMBC, the Department of Mathematics and Statistics, the Center for Interdisciplinary Research and Consulting (CIRC), and the UMBC High Performance Computing Facility (HPCF). HPCF (www.umbc.edu/hpcf) is supported by the National Science Foundation through the MRI program (grant nos. CNS–0821258 and CNS–1228778) and the SCREMS program (grant no. DMS–0821311), with additional substantial support from UMBC. Co-author Jesse D. Smith was supported, in part, by the UMBC National Security Agency (NSA) Scholars Program though a contract with the NSA. Graduate RA Andrew M. Raim was supported by UMBC as HPCF RA. | en_US |
dc.description.uri | https://userpages.umbc.edu/~gobbert/papers/REU2013Team2.pdf | en_US |
dc.format.extent | 13 pages | en_US |
dc.genre | technical report | en_US |
dc.identifier | doi:10.13016/M2VD6P82N | |
dc.identifier.uri | http://hdl.handle.net/11603/11412 | |
dc.language.iso | en_US | en_US |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Mathematics Department Collection | |
dc.relation.ispartof | UMBC Chemistry & Biochemistry Department | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.relation.ispartof | UMBC Student Collection | |
dc.relation.ispartofseries | HPCF Technical Report;HPCF-2013-12 | |
dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | |
dc.subject | Sufficient Reduction | en_US |
dc.subject | Principal Fitted Components | en_US |
dc.subject | parallel computing | en_US |
dc.subject | Molecular Dynamics | en_US |
dc.subject | UMBC High Performance Computing Facility (HPCF) | en_US |
dc.title | Identifying Nonlinear Correlations in High Dimensional Data with Application to Protein Molecular Dynamics Simulations | en_US |
dc.type | Text | en_US |