Skip to content

Commit

Permalink
fix dead link
Browse files Browse the repository at this point in the history
  • Loading branch information
WenmuZhou committed Jul 28, 2021
1 parent 7fec9ed commit fd1bddf
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 2 deletions.
29 changes: 28 additions & 1 deletion ppstructure/table/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,34 @@ The table ocr flow chart is as follows


### 2.1 Train
TBD

In this chapter, we only introduce the training of the table structure model, For model training of [text detection](../../doc/doc_en/detection_en.md) and [text recognition](../../doc/doc_en/recognition_en.md), please refer to the corresponding documents

#### data preparation
The training data uses public data set [PubTabNet](https://arxiv.org/abs/1911.10683 ), Can be downloaded from the official [website](https://github.com/ibm-aur-nlp/PubTabNet) 。The PubTabNet data set contains about 500,000 images, as well as annotations in html format。

#### Start training
*If you are installing the cpu version of paddle, please modify the `use_gpu` field in the configuration file to false*
```shell
# single GPU training
python3 tools/train.py -c configs/table/table_mv3.yml
# multi-GPU training
# Set the GPU ID used by the '--gpus' parameter.
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml
```

In the above instruction, use `-c` to select the training to use the `configs/table/table_mv3.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](../../doc/doc_en/config_en.md).

#### load trained model and continue training

If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.

```shell
python3 tools/train.py -c configs/table/table_mv3.yml -o Global.checkpoints=./your/trained/model
```

**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrain_weights`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrain_weights` will be loaded.

### 2.2 Eval
First cd to the PaddleOCR/ppstructure directory
Expand Down
4 changes: 3 additions & 1 deletion ppstructure/table/README_ch.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@


### 2.1 训练
在这一章节中,我们仅介绍表格结构模型的训练,[文字检测](../../doc/doc_ch/detection.md)[文字识别](../../doc/doc_ch/recognition.md)的模型训练请参考对应的文档。

#### 数据准备
训练数据使用公开数据集[PubTabNet](https://arxiv.org/abs/1911.10683),可以从[官网](https://github.com/ibm-aur-nlp/PubTabNet)下载。PubTabNet数据集包含约50万张表格数据的图像,以及图像对应的html格式的注释。

Expand All @@ -31,7 +33,7 @@ python3 tools/train.py -c configs/table/table_mv3.yml
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/table/table_mv3.yml
```

上述指令中,通过-c 选择训练使用configs/table/table_mv3.yml配置文件。有关配置文件的详细解释,请参考[链接](./config.md)
上述指令中,通过-c 选择训练使用configs/table/table_mv3.yml配置文件。有关配置文件的详细解释,请参考[链接](../../doc/doc_ch/config.md)

#### 断点训练

Expand Down

0 comments on commit fd1bddf

Please sign in to comment.