Salary doc processor is a web api created for the purpose of extracting data from a salary slip and save them in a NoSQL Database. It is in version1 and is customisable for different sorts of salary slips.
Flask, Postman, MongoDB, PyMongo, Python3 and PyMuPDF
- Clone the repo using
git clone <reponame>
2.cd <reponame>
- Grab a virtual environment and activate it
virtualenv venv
source venv/bin/activate
- Install Dependencies inside requirements.txt by running
pip install -r requirements.txt
- Inside config.py give your mongodb url for the local host.
- Change the code inside docuparsercore/parser.py according to the fields and requirements of the business case.
- To Run the server
python manage.py run
- Open postman and hit the URL with POST request, request body is of file type.
http:https://0.0.0.0:5000/pdf/v1/parse-data
Request body in form-data:
document_pdf(File): pay-slip.pdf
Response:
{
"status": "OK",
"code": 200,
"response_data": {
"Name": "Vishnu Prasad",
"Employee Code": "4XX1",
"Designation": "Engineer",
"PAN": "FLQPXXXXXX",
"Basic Pay": "12000",
"DA": "3600",
"Bonus": "700",
"House Rent": "4800",
"Transport Allowance": "1600",
"Performance Allowance": "6099",
"Total Earnings": "Rs. 32,450",
"Month and Year": "June 2021",
"_id": "6228d102679b281883dea1f7"
},
"message": "PDF details extracted and saved in DB"
}
- You can see the data in mongodb atlast