Skip to content

An agent to detect outliers from any kind of data flows, such as log applications, data streaming, etc.

License

Notifications You must be signed in to change notification settings

lorenzomartino86/anomaly-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CircleCI codecov

Anomaly Classifier

This is an anomaly classifier agent designed following the clean architecture in order to decouple domain classes from any external adapter. This project is actually in its early stages and needs to be improved in future by adding real integration tests with data flows coming from different sources (applications, storages, etc.).

Algorithms

This project organizes the high level operational step of the algorithms inside classes called pipelines:

  • Cluster pipeline: These algorithms classify raw input into clusters with a given technique and then retrieve the outliers comparing train and test clusters.
    • Cosine Similarity: It's a measure of similarity between two non-zero vectors of a inner product space that measures the cosine of the angle between them. If you want to find out more please check more details on the sklearn documentation: cosine_similarity

Project Structure

The agent is divided by following main packages:

  • Domain: It includes domain layers such as algorithmic pipelines, delegations objects and the use cases.
  • Adapter: It contains all the classes with the responsibility to retrieve, store, persist and publish analyzed data.
  • Decorator: A package with all utility functions used as decorator by domain and adapter classes.

Instructions

The most simple classification usage is the one that use in-memory datasource

  • Generate training and test data as collections:
    train_data = list()
    train_data.append("Hello world")
    train_data.append("Uncle Bob")
    
    test_data = list()
    test_data.append("It's an outlier")
    test_data.append("Hello world")
  • Compilation of cluster classifier with the mandatory fields, pipeline is set to ClusterPipeline by default:
    classifier = ClusterClassifierFactory(train_repository=InMemoryRepository(data=train_data),
                               test_repository=InMemoryRepository(data=test_data),
                               notifier=InMemoryBroker())
    classifier.add_pipeline(pipeline=CosineSimilarityPipeline(ratio=.70))
    classifier = classifier.compile()
  • Outlier detection:
    outliers = classifier.detect_outliers()
  • Getting the corpus of detected outliers:
    >>  for outlier in outliers:
            print (outlier.records)
    [{'corpus': "It's an outlier"}]

About

An agent to detect outliers from any kind of data flows, such as log applications, data streaming, etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages