This is an anomaly classifier agent designed following the clean architecture in order to decouple domain classes from any external adapter. This project is actually in its early stages and needs to be improved in future by adding real integration tests with data flows coming from different sources (applications, storages, etc.).
This project organizes the high level operational step of the algorithms inside classes called pipelines:
- Cluster pipeline: These algorithms classify raw input into clusters with a given technique and then retrieve the outliers comparing train and test clusters.
- Cosine Similarity: It's a measure of similarity between two non-zero vectors of a inner product space that measures the cosine of the angle between them. If you want to find out more please check more details on the sklearn documentation: cosine_similarity
The agent is divided by following main packages:
- Domain: It includes domain layers such as algorithmic pipelines, delegations objects and the use cases.
- Adapter: It contains all the classes with the responsibility to retrieve, store, persist and publish analyzed data.
- Decorator: A package with all utility functions used as decorator by domain and adapter classes.
The most simple classification usage is the one that use in-memory datasource
- Generate training and test data as collections:
train_data = list()
train_data.append("Hello world")
train_data.append("Uncle Bob")
test_data = list()
test_data.append("It's an outlier")
test_data.append("Hello world")
- Compilation of cluster classifier with the mandatory fields, pipeline is set to ClusterPipeline by default:
classifier = ClusterClassifierFactory(train_repository=InMemoryRepository(data=train_data),
test_repository=InMemoryRepository(data=test_data),
notifier=InMemoryBroker())
classifier.add_pipeline(pipeline=CosineSimilarityPipeline(ratio=.70))
classifier = classifier.compile()
- Outlier detection:
outliers = classifier.detect_outliers()
- Getting the corpus of detected outliers:
>> for outlier in outliers:
print (outlier.records)
[{'corpus': "It's an outlier"}]