AllAboutPDF is a web-based application for working with PDF files. With this app, you can perform a variety of PDF-related tasks, such as finding out mata data, extract image, extract text, extract annotation and more. ๐จ One of the unique features that sets AllAboutPDF apart from other online PDF apps is our ChatPDF feature. This feature allows users to interact with their PDF files using OpenAI and LangChain's natural language processing technology, enabling users to quickly find the information they need and complete tasks more efficiently.
The live version of the app is hosted on Streamlit Sharing and can be accessed at the following URL:
- Main application: https://amitgupta4407-all-about-pdf-app-dmn92l.streamlit.app/
- Test feature: https://allaboutpdf-multiple-filequery-feature.streamlit.app/
- Extract text from a PDF file ๐ฌ
- Extract images from a PDF file ๐ผ๏ธ
- Extract metadata from a PDF file ๐
- Encrypt a PDF file with a password ๐
- Chat with a PDF file using OpenAI and Langchain ๐ค
- Chat with multiple textual file(pdf, txt, doc, excel, csv, sql) (https://allaboutpdf-multiple-filequery-feature.streamlit.app/)
AllAboutPDF is built using the Python programming language ๐ and the Streamlit framework. The app uses the PyPDF2 library to perform various PDF-related tasks, such as parsing and extracting relavent information from PDFs. The app also uses OpenAI and Langchain APIs to enable the "ChatPDF" feature.
When a user uploads a PDF file to the app, the app performs the requested task (e.g. merging PDFs), and then generates a new PDF file that the user can download.
To install the repository, please clone this repository and install the requirements:
pip install -r requirements.txt
- To use the main application, run the
main.py
file with the streamlit CLI (after having installed streamlit):
streamlit run app.py
- To use the test feature application, run the
FileQueryHub.py
file with the streamlit CLI (after having installed streamlit):
streamlit run FileQueryHub.py
The motivation behind AllAboutPDF was to create a simple, user-friendly tool for working with PDF files. While there are many PDF-related tools available online, many of them are complex and difficult to use. AllAboutPDF aims to provide an easy-to-use alternative that can be used by anyone, regardless of technical expertise and make process of data extraction a cake work.
PDF files are a ubiquitous file format used for sharing documents across platforms and devices. However, working with PDF files can often be a tedious and time-consuming process. AllAboutPDF aims to solve this problem by providing a simple, user-friendly tool for working with PDF files.
AllAboutPDF is built using the following technologies:
- Python ๐
- Streamlit ๐
- PyPDF2 ๐
- OpenAI ๐ค
- Langchain ๐
๐ Selecting the most suitable libraries for the project, which we accomplished by choosing Python, Streamlit, PyPDF2, and LangChain. ๐ Developing a unique feature that distinguishes AllAboutPDF from other online PDF apps. Our ChatPDF feature allows users to interact with their PDF files using OpenAI and LangChain's natural language processing technology. ๐ฐ Optimizing the cost of preparing the knowledge base for ChatPDF by selecting the correct size and ratio of the chunk size and overlap size.
We have several future plans for AllAboutPDF, including:
- Merge multiple PDF files into a single file ๐
- Split a PDF file into multiple files ๐
- Compress a PDF file to reduce its size ๐
- Convert a PDF file to a different file format (e.g. JPEG, PNG, DOCX) ๐
- Adding more PDF-related features, such as OCR (Optical Character Recognition) and watermarking
- Adding support for more file formats (e.g. Word documents, Excel spreadsheets)
If you have any feedback or suggestions for how we can improve AllAboutPDF, please don't hesitate to get in touch!
Some Screen shot for [ https://s3.amazonaws.com/static.nomic.ai/gpt4all/2023_GPT4All_Technical_Report.pdf ]
- ๐ Streamlit: https://streamlit.io/
- ๐ Langchain docs: https://python.langchain.com/en/latest/index.html
- ๐ How๐ค ChatPDF works: https://bennycheung.github.io/ask-a-book-questions-with-langchain-openai