Optimization of Computations Used in Information Theory Applied to Base Pair Analysis
dc.contributor.advisor | Gobbert, Matthias K. | |
dc.contributor.advisor | Neerchal, Nagaraj K. | |
dc.contributor.author | Coates, Andrew | |
dc.contributor.author | Ilchenko, Alexey | |
dc.contributor.author | O’Neill, Patrick | |
dc.contributor.author | Erill, Ivan | |
dc.date.accessioned | 2018-10-24T19:16:48Z | |
dc.date.available | 2018-10-24T19:16:48Z | |
dc.date.issued | 2011 | |
dc.description.abstract | Information theory is used in biology as an approach to analyzing and discovering enzyme binding sites in DNA. By using entropy, a measure of uncertainty, to analyze samples, information can be gained to make predictions more certain, or less uncertain. Information is gained by measuring how uncertain we expect to be at a site and deduct the uncertainty from observing the site. To get the desired accuracy needed for these predictions, one has to do costly computations. When there are hundreds of samples, values can be approximated sufficiently accurately. When there are very few samples, values can be calculated by exhaustive enumeration. In the intermediate range, however, it’s not clear that the existing approximations are optimal and the original method of exhaustive enumeration is very costly with regards to time and memory. The better these predictions are the better biologists can predict where enzymes will bind when performing expensive wet lab experiments. By predicting where the enzymes bind, they are predicting what function the protein will take. This work develops an algorithm for the intermediate range of sample sizes that is much faster and uses less memory in the process than the original method. The original naive computation is of exponential time complexity with respect to the number of samples. The new algorithm uses number theory to reduce the time complexity to cubic and further simplify the computation by using assumptions from biology. This is a substantial speedup in all ranges of sample sizes, born out by comparing Python, MATLAB, and serial C code. Computational experiments that parallelize the studies using the Intel Concurrent Collections (CnC) programming paradigm moreover show that tools from parallel computing can be useful to make larger parameter studies feasible in reasonable time. | en_US |
dc.description.sponsorship | Interdisciplinary Program in High Performance Computing (www.umbc.edu/hpcreu) in the UMBC Department of Mathematics and Statistics, funded by the National Science Foundation (grant no. DMS–0851749). This program is also supported by UMBC, the Department of Mathematics and Statistics, the Center for Interdisciplinary Research and Consulting (CIRC), and the UMBC High Performance Computing Facility (HPCF). The computational hardware in HPCF (www.umbc.edu/hpcf) is partially funded by the National Science Foundation through the MRI program (grant no. CNS–0821258) and the SCREMS program (grant no. DMS–0821311), with additional substantial support from UMBC. | en_US |
dc.description.uri | https://userpages.umbc.edu/~gobbert/papers/REU2011Team3.pdf | en_US |
dc.format.extent | 9 pages | en_US |
dc.genre | technical report | en_US |
dc.identifier | doi:10.13016/M2PZ51Q7M | |
dc.identifier.uri | http://hdl.handle.net/11603/11675 | |
dc.language.iso | en_US | en_US |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Mathematics Department Collection | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.relation.ispartof | UMBC Student Collection | |
dc.relation.ispartof | UMBC Biological Sciences Department | |
dc.relation.ispartofseries | HPCF Technical Report;HPCF–2011–13 | |
dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | |
dc.subject | Optimization | en_US |
dc.subject | Information Theory | en_US |
dc.subject | Base Pair Analysis | en_US |
dc.subject | Concurrent Collections (CnC) | en_US |
dc.subject | UMBC High Performance Computing Facility (HPCF) | en_US |
dc.subject | entropy | |
dc.subject | predicting where enzymes will bind | |
dc.title | Optimization of Computations Used in Information Theory Applied to Base Pair Analysis | en_US |
dc.type | Text | en_US |
Files
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.68 KB
- Format:
- Item-specific license agreed upon to submission
- Description: