Research on unsupervised feature learning for Android malware detection based on Restricted Boltzmann Machines

Author/Creator ORCID

Date

2021-03-08

Department

Program

Citation of Original Publication

Liu, Zhen, et al. "Research on Unsupervised Feature Learning for Android Malware Detection based on Restricted Boltzmann Machines" Future Generation Computer Systems 120 (July 2021): pp. 91-108. https://doi.org/10.1016/j.future.2021.02.015.

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
Access to this item will begin on 03-08-2023

Subjects

Abstract

Android malware detection has attracted much attention in recent years. Existing methods mainly research on extracting static or dynamic features from mobile apps and build mobile malware detection model by machine learning algorithms. The number of extracted static or dynamic features maybe much high. As a result, the data suffers from high dimensionality. In addition, to avoid being detected, malware data is varied and hard to obtain in the first place. To detect zeroday malware, unsupervised malware detection methods were applied. In such case, unsupervised feature reduction method is an available choice to reduce the data dimensionality. In this paper, we propose an unsupervised feature learning algorithm called Subspace based Restricted Boltzmann Machines (SRBM) for reducing data dimensionality in malware detection. Multiple subspaces in the original data are firstly searched. And then, an RBM is built on each subspace. All outputs of the hidden layers of the trained RBMs are combined to represent the data in lower dimension. The experimental results on OmniDroid, CIC2019 and CIC2020 datasets show that the features learned by SRBM perform better than the ones learned by other feature reduction methods when the performance is evaluated by clustering evaluation metrics, i.e., NMI, ACC and Fscore.