Identifying Nonlinear Correlations in High Dimensional Data with Application to Protein Molecular Dynamics Simulations

dc.contributor.authorBailey, William J.
dc.contributor.authorChambless, Claire A.
dc.contributor.authorCho, Brandynne M.
dc.contributor.authorSmith, Jesse D.
dc.contributor.authorRaim, Andrew M.
dc.contributor.authorAdragni, Kofi P.
dc.contributor.authorThorpe, Ian F.
dc.date.accessioned2018-10-01T13:53:50Z
dc.date.available2018-10-01T13:53:50Z
dc.date.issued2013
dc.description.abstractComplex biomolecules such as proteins can respond to changes in their environment through a process called allostery, which plays an important role in regulating the function of these biomolecules. Allostery occurs when an event at a specific location in a macromolecule produces an effect at a location in the molecule some distance away. An important component of allostery is the coupling of protein sites. Such coupling is one mechanism by which allosteric effects can be transmitted over long distances. To understand this phenomenon, molecular dynamic simulations are carried out with a large number of atoms, and the trajectories of these atoms are recorded over time. Simple correlation methods have been used in the literature to identify coupled motions between protein sites. We implement a recently developed statistical method for dimension reduction called principal fitted components (PFC) in the statistical programming language R to identify both linear and non-linear correlations between protein sites while dealing efficiently with the high dimensionality of the data. PFC models reduce the dimensionality of data while capturing linear and nonlinear dependencies among predictors (atoms) using a flexible set of basis functions. For faster processing, we implement the PFC algorithm using parallel computing through the Programming with Big Data in R (pbdR) package for R. We demonstrate the methods’ effectiveness on simulated datasets, and apply the routine to time series data from Molecular Dynamic (MD) simulations to identify coupled motion among the atoms.en_US
dc.description.sponsorshipThese results were obtained as part of the REU Site: Interdisciplinary Program in High Performance Computing (www.umbc.edu/hpcreu) in the Department of Mathematics and Statistics at the University of Maryland, Baltimore County (UMBC) in Summer 2013. This program is funded jointly by the National Science Foundation and the National Security Agency (NSF grant no. DMS–1156976), with additional support from UMBC, the Department of Mathematics and Statistics, the Center for Interdisciplinary Research and Consulting (CIRC), and the UMBC High Performance Computing Facility (HPCF). HPCF (www.umbc.edu/hpcf) is supported by the National Science Foundation through the MRI program (grant nos. CNS–0821258 and CNS–1228778) and the SCREMS program (grant no. DMS–0821311), with additional substantial support from UMBC. Co-author Jesse D. Smith was supported, in part, by the UMBC National Security Agency (NSA) Scholars Program though a contract with the NSA. Graduate RA Andrew M. Raim was supported by UMBC as HPCF RA.en_US
dc.description.urihttps://userpages.umbc.edu/~gobbert/papers/REU2013Team2.pdfen_US
dc.format.extent13 pagesen_US
dc.genretechnical reporten_US
dc.identifierdoi:10.13016/M2VD6P82N
dc.identifier.urihttp://hdl.handle.net/11603/11412
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Mathematics Department Collection
dc.relation.ispartofUMBC Chemistry & Biochemistry Department
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.relation.ispartofseriesHPCF Technical Report;HPCF-2013-12
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectSufficient Reductionen_US
dc.subjectPrincipal Fitted Componentsen_US
dc.subjectparallel computingen_US
dc.subjectMolecular Dynamicsen_US
dc.subjectUMBC High Performance Computing Facility (HPCF)en_US
dc.titleIdentifying Nonlinear Correlations in High Dimensional Data with Application to Protein Molecular Dynamics Simulationsen_US
dc.typeTexten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
REU2013Team2.pdf
Size:
353.76 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.68 KB
Format:
Item-specific license agreed upon to submission
Description: