Multilingual Text Alignment

dc.contributor.advisorJoshi, Karuna P
dc.contributor.advisorJoshi, Anupam
dc.contributor.authorRanade, Priyanka
dc.contributor.departmentInformation Systems
dc.contributor.programInformation Systems
dc.date.accessioned2021-01-29T18:12:39Z
dc.date.available2021-01-29T18:12:39Z
dc.date.issued2019-01-01
dc.description.abstractCybersecurity threats, exploits, and intelligence sources have evolved to be largely cross-regional over the course of time. Although the security community perpetually addresses this topic, its scope is continually stretching and introducing new areas of study. Particularly, an area of research that is relevant but heavily under-explored, is the use of multilingual open source intelligence in cyber operations. Open Source Intelligence (OSINT) in the form of text is scattered across major criminal networks, and is highly multilingual in nature. By aligning multilingual sources, the security community can tap into new pools of intelligence. Language alignment, can be achieved through the use of neural machine translation (NMT) systems. This theses explores supervised and unsupervised methods in aligning multilingual open source intelligence sources without the use of of third party engines. Although third party engines are growing stronger, they are unsuited for private security environments. First, sensitive intelligence is not a permitted input to third party engines due to privacy and confidentiality policies. In addition, third party engines produce generalized translations that tend to lack exclusive cyber security terminology, which could be integral in attack discovery. We addresses these issues and describe our system that enables threat intelligence understanding across unfamiliar languages. We create monolingual and multilingual word embeddings from open source intelligence data in two distinct languages, and derive a bilingual dictionary through both supervised and unsupervised methods. We then create a neural network based system that takes in cybersecurity data in a different language and outputs the respective English translation. We evaluate with traditional approaches, and through experimental applications.
dc.formatapplication:pdf
dc.genretheses
dc.identifierdoi:10.13016/m2weno-4wg5
dc.identifier.other12023
dc.identifier.urihttp://hdl.handle.net/11603/20731
dc.languageen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department Collection
dc.relation.ispartofUMBC Theses and Dissertations Collection
dc.relation.ispartofUMBC Graduate School Collection
dc.relation.ispartofUMBC Student Collection
dc.sourceOriginal File Name: Ranade_umbc_0434M_12023.pdf
dc.subjectcyber
dc.subjectnlp
dc.subjectosint
dc.subjectsecurity
dc.titleMultilingual Text Alignment
dc.typeText
dcterms.accessRightsDistribution Rights granted to UMBC by the author.
dcterms.accessRightsAccess limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
dcterms.accessRightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Ranade_umbc_0434M_12023.pdf
Size:
1.75 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
RanadePMultilingual_Open.pdf
Size:
47.46 KB
Format:
Adobe Portable Document Format
Description: