Skip to content


Repository files navigation


Authour Python Classification LSTM Analysis License DOI

  • Ensemble framework of some log based anomaly detection work.

  • It is the basic thought with feature engineering to analyse raw logs and finally report the potential malicious logs based on a series of processings.


  • dvc experiments
  • dvc dags


  • convert the logs to structured pandas framework
  • extract the log keys from raw logs
  • analyse the log key exeuction path
  • analyse the paramaters in log key
  • analyse the time series data generated from window size and time interval by PCA.
  • online learning for feedbacks

For the dataset, I have given some examples and you can put your own data into that folder.


# in order to match the libraries versions, please run and build the project in virtual environment
virtualenv env
pip3 install -r requirement.txt

Instructions (In Deeplog_demo folder):

1. Source data:

When the data format is in csv, we need translate them into txt files and split them into batches.


You will get notice on inputing the source location and output location.

2. Data analysis:

we use the logparser tool to transform the source txt log files into structured csv files under a folder, the folder is named by the start and end time. (Find the Lenma_demo under the logparser/logparser/demo)

(use with python2) ---> The python3 version is not provided here. You need to set the locations first:

input_dir = '../../Dataset/Linux/Clear/'   # set the location to yours
output_dir = '../../Dataset/Linux/Clear_Separate_Structured_Logs/'    # set the location to yours

Then you can execute the demo file with python 2.x:


In the stage, we calculate the EventTemplate for every log.

3. Variable Selection:

The will be used to generate the csv file, which will be used to implement the anomaly detection later.


(and has been integrated into models already in demo)

4. Model detection:

Basiclly, we have two modules for DeepLog

  • Whereas, before implementing the modules, we will first see whether there is obvious malicious logs, we will report them first.

  • After that, we will first implement execution path anomaly detection with

  • Finally, we will implement parameter values anomaly detection with

  • As a plus, there is the ML model using PCA in loglizer.

# go to the folder of model
# go to the folder of model


  • The model is based on off-line work, the online real-time detection is not available.
  • The loglizer and logparser are open source tools, author's rights are reserved.
  • I enriched the two tools in the project, notice the differences from the original version.


1.Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis

2.DeepLog: Anomaly Detection and Diagnosis from System Logs

3.Incremental Construction of LSTM Recurrent Neural Network