BenchmarkIDS

Java CLI application to benchmark IDS, using WEKA's Java API and CSE-CIC-IDS2018 dataset.

This is the supporting application of my MSc dissertation on "Benchmarking behavior-based IDS using bio-inspired algorithms".

It's a very specific application, mainly aimed at automating the specific tests used on my dissertation.

Still, it´s a nice example on how to use WEKA's Java API.

The BenchmarkIDS application was created as a result of meeting some requirements related to this work, namely the flexibility provided by the API and trying to overcome some limitations of the graphic environment. The application, developed in Java, has no graphical environment and the results are only shown on the screen (they are not saved in a file).

Structure

The application consists of 4 modules: the initialization module (setup), the training module, the testing module and the scenarios module.

The initialization module is responsible for creating the data files that will be used by the training and testing modules.

Starting from the parameters passed on the command line, open the corresponding files from the dataset, performs the pre-processing tasks and generate the files for training the model (randomly selects 200000 lines from the file and divides them into two files, one with 140,000 lines (70%) for training and the other with the remaining 60,000 (30%), used to test the model) and for zero-day attack simulation testing.

The training module reads the previously generated data file and trains the model, using predefined algorithms. The parameterization of the algorithms is also pre-defined, with 3 models of each algorithm being generated.

The test module loads the previously generated models, reads the test file and the Zero-Day attack simulation file and tests and evaluates the models, generating a set of metrics indicative of the performance of each model.

The scenarios module, which implements the 4 test scenarios of my work, loads the files previously generated from the CSE-CIC-IDS2018 dataset by the Orange data-mining application, selects from one of them 140000 lines for the training phase and 60,000 lines of the second one for the test phase and applies this data to the 3 algorithms used. For reasons of statistical significance, this procedure is repeated ten times, with the data selected each time being completely independent of the others.

Compiling

I compiled the code using Netbeans version 11.2

It's absolutely necessary, for the compilation to succeed:

Include the Weka API, by means of including the file Weka.jar from your Weka installation directory
Include the Wekaclassalgos library. This library, originally developed by Jason Brownlee, is available in github at https://github.com/fracpete/wekaclassalgos

For those using Maven, there's also a Maven dependency that deals with including Wekaclassalgos library (see https://github.com/fracpete/wekaclassalgos for more information).

Information on how to use the WEKA API in Java applications can be found at Use weka in your java code

Usage

To use this application it is necessary to open the Windows command line (or, if using Linux, a shell or terminal) and change to the folder (directory) where the application distribution is located.

Initialization module

In order to execute the initialization module and thus generate the necessary data files, a command with the following structure must be executed:

java -jar BenchmarkIDS.jar setup <fich1> <fich2>

The parameters mean the following:

- file number that will serve as the basis for training the model
- file number that will serve as the basis for the zero-day attack simulation

So, for example, the command

java -jar BenchmarkIDS.jar setup 02 03

will execute the initialization module, accessing the Data_02.csv file to generate the model training data and the Data_03.csv file to create the file that will simulate the zero-day attack.

Training module

In the training module, the command structure is as follows:

java -jar BenchmarkIDS.jar training <algoritmo>

The algoritmo parameter can only assume 3 distinct values:

clonalg - to select the CLONALG algorithm
mlp - to select the Multi-Layer Perceptron algorithm
lvq - to select the Learning Vector Quantization algorithm

If, for example, we intended to train CLONALG models, the command to be executed would be, then:

java -jar BenchmarkIDS.jar training clonalg

Test Module

The test module includes two different options, one to test a certain algorithm and the other to generate and test the ensembles of the 3 algorithms, grouped 2 to 2.

The command structure for testing a given algorithm, including the zero-day attack simulation test, is as follows:

java -jar BenchmarkIDS.jar test <algoritmo>

The valid and supported values for the parameter algoritmo are the same as for the parameter with the same name in the training module.

To generate the ensembles and perform the respective tests, simply execute the command:

java -jar BenchmarkIDS.jar ensemble

Scenarios Module

To use the scenarios module, simply issue the command:

java -jar BenchmarkIDS.jar training <cenario>

The valid values for the parameter cenario are :

cenario1 - will perform the defined tests for scenario 1
cenario2 - will perform the defined tests for scenario 2
cenario3 - will perform the defined tests for scenario 3
cenario4 - will perform the defined tests for scenario 4

Notes

As mentioned before, this application was developed to meet specific requirements of my work. As such, some things were just simplified, like:

The location of the data files is hardcoded
The algorithms used are just the ones I used on my work
The patametrization of each algorithm is also hardcoded
The name of the data files resembles what I used on my work and is also hardcoded

About the authors

Paulo Ferreira ([email protected]) - I'm a MSc student - Cybersecurity and Computer Forensics - at Polytechnic Institute of Leiria (IPL), School of Technology and Management (ESTG)

Mário Antunes ([email protected]) - professor and Coordinator of the Master in Cybersecurity and Computer Forensics at Polytechnic Institute of Leiria (IPL), School of Technology and Management (ESTG), Department of Computer Engineering (DEI).

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
nbproject		nbproject
src/pjgf		src/pjgf
README.md		README.md
build.xml		build.xml
manifest.mf		manifest.mf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BenchmarkIDS

Structure

Compiling

Usage

Initialization module

Training module

Test Module

Scenarios Module

Notes

About the authors

About

Releases

Packages

Languages

paulo-ferreira-mcif/benchmarkids

Folders and files

Latest commit

History

Repository files navigation

BenchmarkIDS

Structure

Compiling

Usage

Initialization module

Training module

Test Module

Scenarios Module

Notes

About the authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages