GENPass: A Multi-Source Deep Learning Model For Password Guessing

Author/Creator ORCID

Date

2019-09-11

Department

Program

Citation of Original Publication

Z. Xia, P. Yi, Y. Liu, B. Jiang, W. Wang and T. Zhu, "GENPass: A Multi-Source Deep Learning Model for Password Guessing," in IEEE Transactions on Multimedia. doi: 10.1109/TMM.2019.2940877 keywords: {Password;Neural networks;Deep learning;Gallium nitride;Training;Computational modeling;Markov processes}, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8832180&isnumber=4456689

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/

Abstract

The password has become today’s dominant method of authentication. While brute-force attack methods such as HashCat and John the Ripper have proven unpractical, the research then switches to password guessing. State-of-the-art approaches such as the Markov Model and probabilistic contextfree grammar (PCFG) are all based on statistical probability. These approaches require a large amount of calculation, which is time-consuming. Neural networks have proven more accurate and practical in password guessing than traditional methods. However, a raw neural network model is not qualified for crosssite attacks because each dataset has its own features. Our work aims to generalize those leaked passwords and improves the performance in cross-site attacks. In this paper, we propose GENPass, a multi-source deep learning model for generating “general” password. GENPass learns from several datasets and ensures the output wordlist can maintain high accuracy for different datasets using adversarial generation. The password generator of GENPass is PCFG+LSTM (PL). We are the first to combine a neural network with PCFG. Compared with Long short-term memory (LSTM), PL increases the matching rate by 16%-30% in cross-site tests when learning from a single dataset. GENPass uses several PL models to learn datasets and generate passwords. The results demonstrate that the matching rate of GENPass is 20% higher than by simply mixing datasets in the cross-site test. Furthermore, we propose GENPass with probability (GENPass-pro), the updated version of GENPass, which can further increase the matching rate of GENPass.