This repo hosts the basic functional code for our approach entitled HyperDQA in the Document Visual Question Answering competition hosted as a part of Workshop on Text and Documents in Deep Learning Era at CVPR2020. Our approach stands at position 4 on the Leaderboard.
Read more about our approach in this blogpost!
- Clone the repository
git clone https://github.com/anisha2102/docvqa.git
- Install libraries
pip install -r requirements.txt
-
Download the dataset The dataset for Task 1 can be downloaded from the Competition Website from the Downloads Section. The dataset consists of document images and their corresponding OCR transcriptions.
-
Download the pretrained model Download the pretrained model for LayoutLM-Base, Uncased from here
python create_dataset.py \
<data-ocr-folder> \
<data-documents-folder> \
<path-to-train_v1.0.json> \
<train-output-json-path> \
<validation-output-json-path>
CUDA_VISIBLE_DEVICES=0 python run_docvqa.py \
--data_dir <data-folder> \
--model_type layoutlm \
--model_name_or_path <pretrained-model-path> \ #example ./models/layoutlm-base-uncased
--do_lower_case \
--max_seq_length 512 \
--do_train \
--num_train_epochs 15 \
--logging_steps 500 \
--evaluate_during_training \
--save_steps 500 \
--do_eval \
--output_dir <data-folder>/<exp-folder> \
--per_gpu_train_batch_size 8 \
--overwrite_output_dir \
--cache_dir <data-folder>/models \
--skip_match_answers \
--val_json <train-output-json-path> \
--train_json <train-output-json-path> \
Download the pytorch_model.bin file from the link below and copy it to the models folder. Google Drive Link
Try out the demo on a sample datapoint with demo.ipynb
The code and pretrained models are based on LayoutLM and HuggingFace Transformers. Many thanks for their amazing open source contributions.