Skip to content

Latest commit

 

History

History

LogCluster

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

LogCluster

LogCluster is a Perl-based tool for log file clustering and mining line patterns from log files. The development of LogCluster was inspired by SLCT, but LogCluster includes a number of novel features and data processing options.

To provide a common interface for log parsing, we write a Python wrapper around the original LogCluster source code in Perl. This also eases our benchmarking experiments. The implementation has been tested on both Linux and Windows systems. Especially, Strawberry Perl has been installed to run the Perl program on Windows.

Read more information about LogCluster from the following paper:

Running

The code has been tested in the following enviornment:

  • python 3.7.6
  • regex 2022.3.2
  • pandas 1.0.1
  • numpy 1.18.1
  • scipy 1.4.1
  • perl 5.26.1

Run the following scripts to start the demo:

python demo.py

Run the following scripts to execute the benchmark:

python benchmark.py

Benchmark

Running the benchmark script on Loghub_2k datasets, you could obtain the following results.

Dataset F1_measure Accuracy
HDFS 0.951863 0.546
Hadoop 0.885621 0.563
Spark 0.974048 0.7985
Zookeeper 0.924229 0.7315
BGL 0.996965 0.835
HPC 0.985579 0.7875
Thunderbird 0.997233 0.5985
Windows 0.907275 0.713
Linux 0.921884 0.6285
Android 0.983998 0.7975
HealthApp 0.758866 0.5305
Apache 0.942418 0.7085
Proxifier 0.761671 0.478
OpenSSH 0.931467 0.4255
OpenStack 0.987363 0.6955
Mac 0.932406 0.6035

Citation

🔭 If you use our logparser tools or benchmarking results in your publication, please kindly cite the following papers.