"Depending on the author, a million books can be distilled into a single sentence, and a single sentence can conceal a million books, depending on the Author." - Me
SUM is a knowledge distillation platform that harnesses the power of AI, NLP, and ML to extract, analyze, and present insights from vast datasets in a structured, concise, and engaging manner. With access to potentially all kinds of knowledge, the goal is to summarize it into a succinct & dense human-readable form allowing one to "download" tomes quickly whilst doing away with the "fluff".
SUM (Summarizer) is an advanced tool for knowledge distillation, leveraging cutting-edge AI, NLP, and ML techniques to transform vast datasets into concise and insightful summaries. Key features include:
- Multi-level summarization (tags, sentences, paragraphs)
- Interactive analysis with user feedback
- Temporal analysis for tracking concept and sentiment changes
- Topic modeling for cross-document analysis
- Knowledge Graph construction and visualization
- Multi-lingual support with language detection and translation
- Adaptive parameter adjustment based on user feedback
- Comprehensive text analysis (entity recognition, keyword extraction, sentiment analysis)
- Word cloud generation
- Data export functionality
To install the required libraries, run:
pip install json nltk spacy scikit-learn networkx matplotlib pandas wordcloud textblob gensim langdetect googletrans==3.1.0a0
python -m spacy download en_core_web_lg
python -m nltk.downloader punkt stopwords wordnet
from advanced_summarizer import AdvancedSUM
summarizer = AdvancedSUM()
summarizer.simulate_interactive_analysis()
texts = summarizer.load_data('data.json')
results = summarizer.batch_process(texts)
summarizer.temporal_analysis(results)
summarizer.export_results(results, 'analysis_results.json')
Loads data from a JSON file.
Processes and analyzes a single text with multi-level summarization.
Processes and analyzes a batch of texts with multi-level summarization.
Performs temporal analysis on processed texts.
Generates a word cloud from the text.
Performs topic modeling on a collection of texts.
Translates the text to the target language.
Builds a knowledge graph from identified topics.
Visualizes the knowledge graph using NetworkX and Matplotlib.
Exports analysis results to a JSON file.
We welcome contributions from the community. If you have ideas for improvements or new features, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes.
- Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature-branch
). - Open a pull request.
For any questions, concerns, or suggestions, please reach out via:
- X: https://x.com/Otota0
- Issues: SUM Issues
I look forward to your feedback and contributions!
This project is licensed under the MIT License. See the LICENSE file for details.
Thank you for using SUM! I hope it helps you distill knowledge effortlessly.
Made with ❤️ by ototao