Skip to content

matkoson/ai-summarization

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Summarization

Repo showcasing AI summarization tool.

Summary

This repo showcases a simple, yet effective tool for document summarization. It can work with plain-text and PDF documents in any language supported by underlying LLM (Mistral by default).

Setup

Installing Dependencies

Install following dependencies (on macOS):

Note you can experiment with anternatives models, just update the MODEL_FILE and MODEL_CONTEXT_WINDOW variables in web-ui.py and/or Notebook.ipynb.

Running

Web UI

In order to run Web UI just run python3 ./web-ui.py in the repo folder. This should open Web UI interface in the browser.

Jupyter Notebook

The tool can be used as Jupyter Labs/Notebook as well, you open the Notebook.ipynb in Jupyter Labs.

Details

Workflow

Depending on the document size, this tool works in following modes:

  1. In the simple case, if the whole document can fit into model's context window then summarizartion is based on adding relevant summarization prompt.
  2. In case of large documents, document processed using "map-reduce" pattern:
  3. The document is first split into smaller chunks using `RecursiveCharacterTextSplitter`` which tries to respect paragraph and sentence boundarions.
  4. Each chunk is summarized separately (map step).
  5. Chunk summarization are summarized again to give final summary (reduce step).

Local processing

All processing is done locally on the user's machine.

  • Quantified Mistral model (mistral-7b-openorca.Q5_K_M.gguf) has around 5,1 GB.

Performance

Relatively small to medium documents (couple of pages) should fit into single context window, which results in processing time of around 40s on Apple MBP with M1 chip.

Troubleshooting

None know issue.

About

AI summarization tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 53.4%
  • Python 46.6%