This repository comprises datasets and source codes used in our Pattern Recognition (2020) paper.
Main dependencies:
- python==3.6
- tensorflow-gpu==1.11.0
- scikit-image==0.14.0
- scikit-learn==0.23.1
- opencv-python==3.3.1.11
- numba==0.51.2
For a fully-automatic setup of the virtual environment (tested on Linux Ubuntu 18.04), set the variable BASE_DIR
in scripts/install.sh
to a valid directory, and then run source scripts/install.sh
from within the repository root directory. BASE_DIR
indicates where additional directories (envs
, concorde
, qsopt
) will be created.
You should have sudo privileges to run properly the installation script.
By default, the virtual environment will be created at $BASE_DIR/envs/deeprec-pr20
. When finishing, the script will automatically activate the just created environment.
The datasets include the (i) integral documents where the training (small) samples are extracted and (ii) the mechanically-shredded documents collections S-MARQUES (D1), S-ISRI-OCR (D2), and S-CDIP (D3) used in the tests. To download them, just run bash scripts/get_dataset.sh
.
It will create a directory datasets
in the repository root directory.
You can download the results by running bash scripts/get_results.sh
.
It will create a directory results
in the repository root directory with three subdirectories (one for each experiment).
A reconstruction demo is available by running python demo.py
. By default, the script uses a pretrained model available in the traindata
directory. Here is an example of output of the demo script:
For details of the parameters, you may run python demo.py --help
.