Statistical Techniques for Language Recognition: An Introduction and Guide for Cryptanalysts

dc.contributor.authorGanesan, Ravi
dc.contributor.authorSherman, Alan T.
dc.date.accessioned2019-02-21T16:04:23Z
dc.date.available2019-02-21T16:04:23Z
dc.date.issued2010-06-04
dc.description.abstractWe explain how to apply statistical techniques to solve several language-recognition problems that arise in cryptanalysis and other domains. Language recognition is important in cryptanalysis because, among other applications, an exhaustive key search of any cryptosystem from ciphertext alone requires a test that recognizes valid plaintext. Written for cryptanalysts, this guide should also be helpful to others as an introduction to statistical inference on Markov chains. Modeling language as a finite stationary Markov process, we adapt a statistical model of pattern recognition to language recognition. Within this framework we consider four well-defined language-recognition problems: 1) recognizing a known language, 2) distinguishing a known language from uniform noise, 3) distinguishing unknown 0th-order noise from unknown lst-order language, and 4) detecting non-uniform unknown language. For the second problem we give a most powerful test based on the Neyman-Pearson Lemma. For the other problems, which typically have no uniformly most powerful tests, we give likelihood ratio tests. We also discuss the chi-squared test statistic X 2 and the Index of Coincidence IC. In addition, we point out useful works in the statistics and pattern-matching literature for further reading about these fundamental problems and test statistics.en
dc.description.urihttps://www.tandfonline.com/doi/pdf/10.1080/0161-119391867980en
dc.format.extent36 pagesen
dc.genrejournal articles postprintsen
dc.identifierdoi:10.13016/m2qc3t-ly3f
dc.identifier.citationRavi Ganesan & Alan T. Sherman (1993) STATISTICAL TECHNIQUES FOR LANGUAGE RECOGNITION: AN INTRODUCTION AND GUIDE FOR CRYPTANALYSTS, CRYPTOLOGIA, 17:4, 321-366, DOI: 10.1080/0161-119391867980en
dc.identifier.urihttps://doi.org/10.1080/0161-119391867980
dc.identifier.urihttp://hdl.handle.net/11603/12835
dc.language.isoenen
dc.publisherTaylor & Francis Online
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Center for Research and Exploration in Space Sciences & Technology II (CRSST II)
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rights“This is an Accepted Manuscript of an article published by Taylor & Francis in Cryptologia on04 Jun 2010, available online: http://www.tandfonline.com/10.1080/0161-119391867980.”
dc.subjectautomatic plaintext recognitionen
dc.subjectcategorical dataen
dc.subjectchi-squared test statisticen
dc.subjectcomputational linguisticsen
dc.subjectcontingency tablesen
dc.subjectcryptanalystsen
dc.subjectcryptographyen
dc.subjectdocument processingen
dc.subjecthypothesis testingen
dc.subjectindex of coincidenceen
dc.subjectlanguage recognitionen
dc.subjectlikelihood ratio testsen
dc.subjectmarkov models of languageen
dc.subjectmaximum likelihood estimatorsen
dc.subjectnatural language processingen
dc.subjectstatistical inferenceen
dc.subjectstatistical pattern recognitionen
dc.subjectstatistics of languageen
dc.subjectweight of evidenceen
dc.titleStatistical Techniques for Language Recognition: An Introduction and Guide for Cryptanalystsen
dc.typeTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ShermanCryptologia93.pdf
Size:
2.03 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: