Statistical Techniques for Language Recognition: An Introduction and Guide for Cryptanalysts
dc.contributor.author | Ganesan, Ravi | |
dc.contributor.author | Sherman, Alan T. | |
dc.date.accessioned | 2019-02-21T16:04:23Z | |
dc.date.available | 2019-02-21T16:04:23Z | |
dc.date.issued | 2010-06-04 | |
dc.description.abstract | We explain how to apply statistical techniques to solve several language-recognition problems that arise in cryptanalysis and other domains. Language recognition is important in cryptanalysis because, among other applications, an exhaustive key search of any cryptosystem from ciphertext alone requires a test that recognizes valid plaintext. Written for cryptanalysts, this guide should also be helpful to others as an introduction to statistical inference on Markov chains. Modeling language as a finite stationary Markov process, we adapt a statistical model of pattern recognition to language recognition. Within this framework we consider four well-defined language-recognition problems: 1) recognizing a known language, 2) distinguishing a known language from uniform noise, 3) distinguishing unknown 0th-order noise from unknown lst-order language, and 4) detecting non-uniform unknown language. For the second problem we give a most powerful test based on the Neyman-Pearson Lemma. For the other problems, which typically have no uniformly most powerful tests, we give likelihood ratio tests. We also discuss the chi-squared test statistic X 2 and the Index of Coincidence IC. In addition, we point out useful works in the statistics and pattern-matching literature for further reading about these fundamental problems and test statistics. | en_US |
dc.description.uri | https://www.tandfonline.com/doi/pdf/10.1080/0161-119391867980 | en_US |
dc.format.extent | 36 pages | en_US |
dc.genre | journal articles postprints | en_US |
dc.identifier | doi:10.13016/m2qc3t-ly3f | |
dc.identifier.citation | Ravi Ganesan & Alan T. Sherman (1993) STATISTICAL TECHNIQUES FOR LANGUAGE RECOGNITION: AN INTRODUCTION AND GUIDE FOR CRYPTANALYSTS, CRYPTOLOGIA, 17:4, 321-366, DOI: 10.1080/0161-119391867980 | en_US |
dc.identifier.uri | https://doi.org/10.1080/0161-119391867980 | |
dc.identifier.uri | http://hdl.handle.net/11603/12835 | |
dc.language.iso | en_US | en_US |
dc.publisher | Taylor & Francis Online | |
dc.relation.isAvailableAt | The University of Maryland, Baltimore County (UMBC) | |
dc.relation.ispartof | UMBC Center for Research and Exploration in Space Sciences & Technology II (CRSST II) | |
dc.relation.ispartof | UMBC Faculty Collection | |
dc.relation.ispartof | UMBC Computer Science and Electrical Engineering Department | |
dc.rights | This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author. | |
dc.rights | “This is an Accepted Manuscript of an article published by Taylor & Francis in Cryptologia on04 Jun 2010, available online: http://www.tandfonline.com/10.1080/0161-119391867980.” | |
dc.subject | automatic plaintext recognition | en_US |
dc.subject | categorical data | en_US |
dc.subject | chi-squared test statistic | en_US |
dc.subject | computational linguistics | en_US |
dc.subject | contingency tables | en_US |
dc.subject | cryptanalysts | en_US |
dc.subject | cryptography | en_US |
dc.subject | document processing | en_US |
dc.subject | hypothesis testing | en_US |
dc.subject | index of coincidence | en_US |
dc.subject | language recognition | en_US |
dc.subject | likelihood ratio tests | en_US |
dc.subject | markov models of language | en_US |
dc.subject | maximum likelihood estimators | en_US |
dc.subject | natural language processing | en_US |
dc.subject | statistical inference | en_US |
dc.subject | statistical pattern recognition | en_US |
dc.subject | statistics of language | en_US |
dc.subject | weight of evidence | en_US |
dc.title | Statistical Techniques for Language Recognition: An Introduction and Guide for Cryptanalysts | en_US |
dc.type | Text | en_US |