pdfsearch

This is an experimental PDF search engine. The project is written go and uses the C API for poppler-glib and cairo to extract text from PDF files.

Motivation

I am a medical doctor and I have a lot of PDF files with notes, books, and articles. I wanted to have a simple search engine to search for keywords in these files. I also wanted to learn more about the go programming language's CGO capabilities.

Installation

You can download the latest release for Linux from the following link: Github pdfsearch Releases

Build from source

Download the required libraries.

You need to have the poppler-glib and cairo libraries installed on your system.

On Ubuntu/debain, you can install them with the following command:

sudo apt-get install libpoppler-glib-dev libcairo2-dev pkg-config

If that does not work, you can try the following command to install all the required libraries:

sudo apt-get install build-essential cmake pkg-config\
 libpoppler-glib-dev glib2.0 glib2.0-dev libfontconfig1-dev \
 libfreetype6-dev libjpeg-dev libpng-dev libtiff-dev \
 libopenjp2-7-dev libcurl4-gnutls-dev libgtest-dev libboost-dev

On Arch Linux, you can install them with the following command:

sudo pacman -S poppler-glib cairo pkg-config

If that does not work, you can try the following command:

sudo pacman -S base-devel cmake pkg-config poppler-glib glib2 fontconfig freetype2 libjpeg-turbo libpng libtiff libcurl-gnutls gtest boost

Then you can install the pdfsearch tool with the following command:

git clone https://github.com/abiiranathan/pdfsearch.git
cd pdfsearch
go install # or go build

We have not tested the Windows build. If you have any issues, please let us know.

USAGE

Step 1: Index the PDF files

./pdfsearch build_index -d /path/to/directory/of/pdf/files

The default index file is ~/index.bin in your home directory. You can specify a different index file with the -i flag or --index flag. We advise you to use the default index file.

This command will create an binary index file with the text extracted from the PDF files in the specified directory.

Run the web server

./pdfsearch serve -p 8080

# Or specify the index file
./pdfsearch serve -p 8080 -i ~/index.bin

This command will start a web server on port 8080. You can specify a different port with the -p flag. You can specify a different index file with the -i flag.

Open the web browser and go to https://localhost:8080 to search for keywords in the PDF files.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
cli		cli
database		database
pdf		pdf
routes		routes
search		search
server		server
static		static
templates		templates
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
deps.svg		deps.svg
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdfsearch

Motivation

Installation

Build from source

USAGE

About

Releases 4

Packages

Languages

abiiranathan/pdfsearch

Folders and files

Latest commit

History

Repository files navigation

pdfsearch

Motivation

Installation

Build from source

USAGE

About

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Languages

Packages