Skip to content

Transformer-based Anomaly Detection on Streams of System Calls (Master's Thesis)

License

Notifications You must be signed in to change notification settings

tinsaye/LID-DS-TF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LID-DS Transformer

Source code for the implementation and evaluation of a Transformer-based Anomaly Detection on Streams of System Calls (Master’s Thesis)

The thesis aimed to develop a host-based intrusion detection system (HIDS) using a transformer-based model as an anomaly detector. The proposed system builds a model of normal behaviour by processing n-grams of system calls during training. The trained model is then used to detect anomalies when an attack occurs by measuring the deviation from the benign profile. The architecture of the transformer-based model consists of stacked decoders to allow a language modelling approach while processing the n-grams of system calls. The development and evaluation of the HIDS employed the Leipzig Intrusion Detection – Data Set (LID-DS) and framework.

Project Structure

src
├── cluster                     # scripts for running the ids pipline on an HPC cluster with slurm
├── decision_engines            # transformer and a modified AE decision engine, also contains the transformer model
├── features                    # several building blocks that could be used as input features for the model.
├── evaluation                  # script and notebooks used for creating the evaluation plots and tables
│   ├── fluctuation_analysis    # utility classes and script for creating the ngram set experiments (sections 5.3 and 6.3 of my thesis)
│   │   └── cluster             # scripts to run these analysis on cluster
│   ├── js_functions            # MongoDB custom js functions used to retrieve/aggregate saved results (lid_ds specifics)
│   ├── preliminary             # scripts used to create the plots for the preliminary experiments (6.2)
│   └── primary                 # plots for the final full evaluation (6.4)
├── misc                        # scripts that can be used to run the AE and MLP based IDSs
├── Models                      # empty directory used as checkpoint for trained models
└── utils                       # helper functions to store and load trained models for a specific epochs

Note

There are several features in the src/features directory that did not make it into my thesis due to limited project scope and changes in research direction.

Requirements

  • Python 3.9
  • LID-DS
  • pytorch (1.9.0)
  • Dataset: see LID-DS readme

Installation

  1. Set up your python environment (venv, ...)

  2. Install main dependency LID-DS from source

cd /path/for/installing/lidds
git clone [email protected]:LID-DS/LID-DS.git
cd LID-DS
git checkout 0f7760a4785f8758359227a1309be46b5d14955a   # to ensure compatibility, last commit at the time of development
pip install -r requirements.txt                         # this is important as the setup script does not install all dependencies
pip install -e .                                        # install LID-DS from source
  1. Install other dependencies
cd /path/to/this/project/LID-DS-TF
pip install -r requirements.txt

Usage

To run the IDS pipeline (2-3 min on my laptop) on the LID-DS dataset for a single example scenario with default configurations, use the following commands:

cd src
mkdir -p dataset/LID-DS-2019/
wget "https://cloud.scadsai.uni-leipzig.de/index.php/s/HLXiWssriRMt9pp/download?files=CVE-2017-7529.tar.gz" -O CVE-2017-7529.tar.gz \
  && tar -zxf CVE-2017-7529.tar.gz -C dataset/LID-DS-2019/ \
  && rm CVE-2017-7529.tar.gz
LID_DS_BASE="dataset" python ids_transformer_main.py

The default configuration can be found in the main function of ids_transformer_main.py.

About

Transformer-based Anomaly Detection on Streams of System Calls (Master's Thesis)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published