Portal-Executable-Images-Malware-Analysis-using-Deep-Convolutional-Networks

Portable Executables: The Portable Executable (PE) format is a ﬁle format for executables, object code, DLLs, and others used in 32-bit and 64-bit versions of Windows operating systems.

Problem Introduction

Visualizing PE as images
Using Deep Learning to classify PE’s containing Malwares

Dataset

Virus MNIST: Portable Executable Files as Images that consists of 10 executable code varieties and approximately 50,000 virus examples for Malware Detection. The malicious classes include 9 families of computer viruses and one benign set. Dataset is available on Kaggle. (https://www.kaggle.com/datamunge/virusmnist)

Data Preprocessing

Malook for converting the PE ﬁles to images using Nearest interpolation
Rescaling/Standardization: 32 X 32 gray scale images
Dimensionality reduction (PCA T-SNE, and Truncated SVD) Tomek, Nearest Neighbour and cluster centroids for cleaning the data points
Features signiﬁcance using feature importance graphs and density plot visualizations

Model building

We implemented couple of traditional machine learning models (such as XgBoost, LightGBM) and deep learning models (such as DNN, CNN, ResNet, MobileNet,LSTM, SqueezeNet), then compared the performance of each model using Pytorch

Evaluation and Metric

For evaluating all our models we have used the holdout strategy of creating a dataset using the sklearn test and train split. We have passed the stratify parameter that preserves the same proportions of examples in each class as observed. We calculated both the accuracy and ‘weighted’ F1 score

Findings on the results

Xgboost Classiﬁer and Lightgbm was treated as baselines with default parameters that achieved a scores of 0.89 approx. where the training was done directly on pixels (1204)
Then, Feedforward network DNN was tried with 3 hidden states, dropouts and regularization that further improved the performance
After instead of feeding pixels, images were fed to deep convolutional networks followed by fully connected network. Then, different CNN architectures were tried : Resnet: That includes residual blocks, squeezenet that includes ﬁre modules, Mobilenet with inverted residuals were tried
Finally, Bi-di LSTM were tried with the intuition that PE images are piece of code that might have interlinking

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Output		Output
Trained Models		Trained Models
images		images
(final)Malware_images_[EDA+Data_processing]_.ipynb		(final)Malware_images_[EDA+Data_processing]_.ipynb
Malware_Images_training.ipynb		Malware_Images_training.ipynb
Malware_models_evaluation.ipynb		Malware_models_evaluation.ipynb
README.md		README.md
Squeezenet_final.ipynb		Squeezenet_final.ipynb
sift-feature-extraction.ipynb		sift-feature-extraction.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Portal-Executable-Images-Malware-Analysis-using-Deep-Convolutional-Networks

Problem Introduction

Dataset

Data Preprocessing

Model building

Evaluation and Metric

Findings on the results

About

Releases

Packages

Languages

bhavesh0124/Portal-Executable-Images-Malware-Analysis-using-Deep-Convolutional-Networks

Folders and files

Latest commit

History

Repository files navigation

Portal-Executable-Images-Malware-Analysis-using-Deep-Convolutional-Networks

Problem Introduction

Dataset

Data Preprocessing

Model building

Evaluation and Metric

Findings on the results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages