GitHub - sravan-s/document_search: To search through some large law document

Please donot use this for commercial purposes This is a study project to search through new indian law document https://www.mha.gov.in/sites/default/files/250883_english_01042024.pdf

Before Make sure you have docker and rust in your system. docker compose up qdrant

Step #1 (This part can be automated) Feed the law document to database

note This PDF is unstrcutured and quite hard to parse so the following step is idiosyncratic Prepare data

Download the PDF

Convert it into text -> https://www.xpdfreader.com/pdftotext-man.html -> laws.txt

Move text file to ./data

Split it into sections You need to do some manual work, because data in PDF is unstrctured Or maybe you can use one of the paid LLMs to parse the PDF into structured data

Step #2 Keep structured data in text_files ./data/formatted

See format.example to see the example
Ideally seperate by chapter, but it doesnt matter Convert structured data to embedding using sentence transformer(see law_search repo) Save embedding into vector DB(see law_search repo)

Step #3 Use webserver to query from vecterDb(see web_server)

Webserver is rocket

We use Qdrant as vectorDB Postgres to save structured data ~~Candle-rs + GritLM/GritLM-8x7B (if possible) for sentence transformer~~ Used fastembed + BGELargeENV15

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
law_search		law_search
manage_db		manage_db
web_servevr		web_servevr
.env.example		.env.example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
format.example		format.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

sravan-s/document_search

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages