Implementation of Information Retrieval and Text Mining algorithms including:
- Indexers:
- Inverted
- KGram
- Boolean retrieval
- WildCard retrieval
- Distance calculation
- Ranking based retrieval (cosine-similarity and tf-idf)
- Perceptron classification
- Multiple confusion matrix stats
- KMeans Clustering, with RSS based optimization
The tests are run using xmlrunner (following the unittest style).
The documentation style is
NumPy/SciPy Docstrings
. -
Extensive Debugging
calls are commented.