Skip to content

A web interface that allows searching for PDFs by their content

License

Notifications You must be signed in to change notification settings

FelixKohlhas/pdf_search

Repository files navigation

PDF Search

Web interface for searching PDF files by their content

Features

  • Search for specific keywords within a collection of PDF files.
  • View matched lines from the PDF files for each search result.
  • Sort search results based on the relevance of matches.
  • Display search results with a calculated relevance ratio.
  • Web interface powered by Flask and SQLite database.

Requirements

  • Python 3.x
  • Flask
  • PyPDF2

Getting Started

  1. Clone this repository:

    git clone https://github.com/FelixKohlhas/pdf_search.git
    cd pdf_search
  2. Install the required Python packages:

    pip install -r requirements.txt
  3. Create the database

    python generate_db.py <path to pdfs>
  4. Run the web interface:

    python app.py -f <path to pdfs>
  5. Open your web browser and navigate to http:https://localhost:5001 to access the PDF search.

Usage

generate_db.py

usage: generate_db.py [-h] [-d DATABASE] pdf_folder

Extract text from PDF files and store it in a SQLite database.

positional arguments:
  pdf_folder            Path to the folder containing PDF files

options:
  -h, --help            show this help message and exit
  -d DATABASE, --database DATABASE
                        Path of the database

app.py

usage: app.py [-h] [-d DATABASE] [-u URL_PREFIX] [-f FILES] [--port PORT]

Flask web interface to search PDF files by their content.

options:
  -h, --help            show this help message and exit
  -d DATABASE, --database DATABASE
                        Path of the database
  -u URL_PREFIX, --url-prefix URL_PREFIX
                        URL to prefix to relative paths
  -f FILES, --files FILES
                        Directory of PDF files (optional; allows access to the files through webinterface)
  --port PORT           Port to run the Flask app (default: 5001)

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or create a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

A web interface that allows searching for PDFs by their content

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published