Using Deep Neural Networks to Translate Multi-lingual Threat Intelligence

Author/Creator ORCID

Date

Department

Program

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please contact the author.

Abstract

The multilingual nature of the Internet increases complications in the cybersecurity community's ongoing efforts to strategically mine threat intelligence from OSINT data on the web. OSINT sources such as social media, blogs, and dark web vulnerability markets exist in diverse languages and hinder security analysts, who are unable to draw conclusions from intelligence in languages they don't understand. Although third party translation engines are growing stronger, they are unsuited for private security environments. First, sensitive intelligence is not a permitted input to third party engines due to privacy and confidentiality policies. In addition, third party engines produce generalized translations that tend to lack exclusive cybersecurity terminology. In this paper, we address these issues and describe our system that enables threat intelligence understanding across unfamiliar languages. We create a neural network based system that takes in cybersecurity data in a different language and outputs the respective English translation. The English translation can then be understood by an analyst, and can also serve as input to an AI based cyber-defense system that can take mitigative action. As a proof of concept, we have created a pipeline which takes Russian threats and generates its corresponding English, RDF, and vectorized representations. Our network optimizes translations on specifically, cybersecurity data.