A Hybrid Deep Network Framework for Android Malware Detection

Abstract : Andriod malware poses a serious threat to users privacy, money, equipment and file integrity. A series of data-driven malware detection methods were proposed. However, there exist two key challenges for these methods: (1) how to learn effective feature representation from raw data; (2) how to reduce the dependence on the prior knowledge or human labors in feature learning. Inspired by the success of deep learning methods in the feature representation learning community, we propose a malware detection framework which starts with learning rich-features by a novel unsupervised feature learning algorithm Merged Sparse Auto-Encoder (MSAE). In order to extract more compact and discriminative feature from the rich-features to further boost the malware detection capability, a hybrid deep network learning algorithm Stacked Hybrid Learning MSAE and SDAE (SHLMD) is established by further incorporating a classical deep learning method Stacked Denoising Auto-encoders (SDAE). After that, we feed the feature learned by MSAE and SHLMD respectively to classification algorithms to train a malware detection model. Evaluation results on two real-world datasets demonstrate that SHLMD achieves 94.46% and 90.57% accuracy respectively, which outperforms the classical unsupervised feature representation learning Sparse Auto-encoder (SAE).
 ? We further enumerated current issues of the existing works from various aspects and provided recommendations based on findings to support further research in this domain. ? The variants of the existing malicious/benign applications could be deployed as new Android malware samples, which is an effective way to attack mobile users and evade detection systems, causing the rapid growth of the scale of malware. ? Most of these existing studies are immersed in the improvement of malware detection performance by employing various advanced deep learning approaches and prove the proposed models surpass other algorithms on their own training data. ? One of these is that it is quite difficult to define a robust malicious feature list by humans’ experience or feature selection approaches based on the existing training data.
 ? Traditional detection methods of manual analysis and signature matching have exposed some problems, such as slow detection speed and low accuracy. ? The traditional RNN has the problem of disappearing gradients, which is especially serious when the time series is long. ? In order to solve the problems in the gradient descent method, an improved method mini-batch gradient descent is proposed, which can reduce the fluctuation of parameter update and finally get better results and more stable convergence. ? As more and more Android malware avoid static detection through techniques such as repackaging and code obfuscation, dynamic analysis methods based on behavioral characteristics can solve this problem well.
 • In their approach, the proposed siamese network includes two identical and weight-sharing MLP networks but with opposite loss functions. • Compared with n-grams requiring exhaustive enumeration, the proposed CNN-based model without hand-engineered features was proven much more computationally efficient with less time cost and less computational resources. • The proposed LSTM based malware detection architecture consisted of four LSTM layers and one classification layer, where the first three LSTM layers were pre-trained in an unsupervised manner without random initialization. • The proposed tool can implement the perturbations onto the source files (eg., classes.dex) at the semantic level automatically, and then rebuild the modified APK.
 ? Deep learning demonstrated excellent performance in image recognition, so malware can be converted into images, and then deep learning algorithms are used for training and detection. ? The advantage of using the DBN is that the learning speed of static features of Android applications is faster and the performance is better. ? A large batch_size can reduce training time and improve stability, but as batch_size increases, the performance of the model will decrease. ? Due to the limited computing resources of mobile devices, and the fact that deep learning is a compute-intensive task, the Android malware detection model proposed in this paper is suitable for running on high-performance computers.

We have more than 145000 Documents , PPT and Research Papers

Have a question ?

Mail us : info@nibode.com