A New Burrows Wheeler Transform Markov Distance

dc.contributor.authorRaff, Edward
dc.contributor.authorNicholas, Charles
dc.contributor.authorMcLean, Mark
dc.date.accessioned2020-03-11T17:08:03Z
dc.date.available2020-03-11T17:08:03Z
dc.date.issued2019-12-30
dc.description.abstractPrior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems. We describe issues with this approach that were not widely known, and introduce our new Burrows Wheeler Markov Distance (BWMD) as an alternative. The BWMD avoids the shortcomings of earlier efforts, and allows us to tackle problems in variable length DNA sequence clustering. BWMD is also more adaptable to other domains, which we demonstrate on malware classification tasks. Unlike other compression-based distance metrics known to us, BWMD works by embedding sequences into a fixed-length feature vector. This allows us to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods.en_US
dc.description.urihttps://arxiv.org/abs/1912.13046en_US
dc.format.extent12 pagesen_US
dc.genrejournal articles preprintsen_US
dc.identifierdoi:10.13016/m2b9ys-17oa
dc.identifier.citationRaff, Edward; Nicholas, Charles; McLean, Mark; A New Burrows Wheeler Transform Markov Distance; Cryptography and Security (2019); https://arxiv.org/abs/1912.13046en_US
dc.identifier.urihttp://hdl.handle.net/11603/17546
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectcompression algorithmsen_US
dc.subjectburrows wheeleren_US
dc.subjectdistance measureen_US
dc.subjectbioinformatics problemsen_US
dc.subjectBurrows Wheeler Markov Distanceen_US
dc.titleA New Burrows Wheeler Transform Markov Distanceen_US
dc.typeTexten_US

Files

License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: