PaddlePaddle · tink2123 · Sep 9, 2021 · Aug 24, 2021 · Aug 24, 2021 · Aug 24, 2021
diff --git a/configs/rec/rec_r31_sar.yml b/configs/rec/rec_r31_sar.yml
@@ -0,0 +1,99 @@
+Global:
+ use_gpu: true
+ epoch_num: 5
+ log_smooth_window: 20
+ print_batch_step: 20
+ save_model_dir: ./sar_rec
+ save_epoch_step: 1
+ # evaluation is run every 2000 iterations
+ eval_batch_step: [0, 2000]
+ cal_metric_during_train: True
+ pretrained_model:
+ checkpoints: 
+ save_inference_dir:
+ use_visualdl: False
+ infer_img: 
+ # for data or label process
+ character_dict_path: ppocr/utils/dict90.txt
+ character_type: EN_symbol
+ max_text_length: 30
+ infer_mode: False
+ use_space_char: False
+ rm_symbol: True
+ save_res_path: ./output/rec/predicts_sar.txt
+
+Optimizer:
+ name: Adam
+ beta1: 0.9
+ beta2: 0.999
+ lr:
+ name: Piecewise
+ decay_epochs: [3, 4]
+ values: [0.001, 0.0001, 0.00001] 
+ regularizer:
+ name: 'L2'
+ factor: 0
+
+Architecture:
+ model_type: rec
+ algorithm: SAR
+ Transform:
+ Backbone:
+ name: ResNet31
+ Head:
+ name: SARHead
+
+Loss:
+ name: SARLoss
+
+PostProcess:
+ name: SARLabelDecode
+
+Metric:
+ name: RecMetric
+
+
+Train:
+ dataset:
+ name: SimpleDataSet
+ label_file_list: ['./train_data/train_list.txt']
+ data_dir: ./train_data/
+ ratio_list: 1.0
+ transforms:
+ - DecodeImage: # load image
+ img_mode: BGR
+ channel_first: False
+ - SARLabelEncode: # Class handling label
+ - SARRecResizeImg:
+ image_shape: [3, 48, 48, 160] # h:48 w:[48,160]
+ width_downsample_ratio: 0.25
+ - KeepKeys:
+ keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
+ loader:
+ shuffle: True
+ batch_size_per_card: 64
+ drop_last: True
+ num_workers: 8
+ use_shared_memory: False
+
+Eval:
+ dataset:
+ name: LMDBDataSet
+ data_dir: ./eval_data/evaluation/
+ transforms:
+ - DecodeImage: # load image
+ img_mode: BGR
+ channel_first: False
+ - SARLabelEncode: # Class handling label
+ - SARRecResizeImg:
+ image_shape: [3, 48, 48, 160]
+ width_downsample_ratio: 0.25
+ - KeepKeys:
+ keep_keys: ['image', 'label', 'valid_ratio'] # dataloader will return list in this order
+ loader:
+ shuffle: False
+ drop_last: False
+ batch_size_per_card: 64
+ num_workers: 4
+ use_shared_memory: False
+
diff --git a/doc/doc_ch/algorithm_overview.md b/doc/doc_ch/algorithm_overview.md
@@ -45,6 +45,7 @@ PaddleOCR基于动态图开源的文本识别算法列表：
 - [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))[12]
 - [x] SRN([paper](https://arxiv.org/abs/2003.12294))[5]
 - [x] NRTR([paper](https://arxiv.org/abs/1806.00926v2))
+- [x] SAR([paper](https://arxiv.org/abs/1811.00751v2))
 
 参考[DTRB][3](https://arxiv.org/abs/1904.01906)文字识别训练和评估流程，使用MJSynth和SynthText两个文字识别数据集训练，在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估，算法效果如下：
 
@@ -60,6 +61,6 @@ PaddleOCR基于动态图开源的文本识别算法列表：
 |RARE|Resnet34_vd|83.6%|rec_r34_vd_tps_bilstm_att |[下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)|
 |SRN|Resnet50_vd_fpn| 88.52% | rec_r50fpn_vd_none_srn | [下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar) |
 |NRTR|NRTR_MTB| 84.3% | rec_mtb_nrtr | [下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) |
-
+|SAR|Resnet31| 87.2% | rec_r31_sar | [下载链接](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
 
 PaddleOCR文本识别算法的训练和使用请参考文档教程中[模型训练/评估中的文本识别部分](./recognition.md)。
diff --git a/doc/doc_ch/recognition.md b/doc/doc_ch/recognition.md
@@ -88,7 +88,10 @@ train_data/rec/train/word_002.jpg 用科技让复杂的世界更简单
 
 若您本地没有数据集，可以在官网下载 [ICDAR2015](http:https://rrc.cvc.uab.es/?ch=4&com=downloads) 数据，用于快速验证。也可以参考[DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ，下载 benchmark 所需的lmdb格式数据集。
 
+如果希望复现SAR的论文指标，需要下载[SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), 提取码：627x。此外，真实数据集icdar2013, icdar2015, cocotext, IIIT5也作为训练数据的一部分。具体数据细节可以参考论文SAR。
+
 如果你使用的是icdar2015的公开数据集，PaddleOCR 提供了一份用于训练 ICDAR2015 数据集的标签文件，通过以下方式下载：
+
 ```
 # 训练集标签
 wget -P ./train_data/ic15_data https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt
@@ -232,6 +235,7 @@ PaddleOCR支持训练和评估交替进行, 可以在 `configs/rec/rec_icdar15_t
 | rec_r34_vd_tps_bilstm_att.yml | CRNN | Resnet34_vd | TPS | BiLSTM | att |
 | rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn |
 | rec_mtb_nrtr.yml | NRTR | nrtr_mtb | None | transformer encoder | transformer decoder |
+| rec_r31_sar.yml | SAR | ResNet31 | None | LSTM encoder | LSTM decoder |
 
 训练中文数据，推荐使用[rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml)，如您希望尝试其他算法在中文数据集上的效果，请参考下列说明修改配置文件：
 

diff --git a/doc/doc_en/algorithm_overview_en.md b/doc/doc_en/algorithm_overview_en.md
@@ -47,6 +47,7 @@ PaddleOCR open-source text recognition algorithms list:
 - [x] RARE([paper](https://arxiv.org/abs/1603.03915v1))[12]
 - [x] SRN([paper](https://arxiv.org/abs/2003.12294))[5]
 - [x] NRTR([paper](https://arxiv.org/abs/1806.00926v2))
+- [x] SAR([paper](https://arxiv.org/abs/1811.00751v2))
 
 Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation result of these above text recognition (using MJSynth and SynthText for training, evaluate on IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE) is as follow:
 
@@ -62,5 +63,6 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
 |RARE|Resnet34_vd|83.6%|rec_r34_vd_tps_bilstm_att |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r34_vd_tps_bilstm_att_v2.0_train.tar)|
 |SRN|Resnet50_vd_fpn| 88.52% | rec_r50fpn_vd_none_srn |[Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_r50_vd_srn_train.tar)|
 |NRTR|NRTR_MTB| 84.3% | rec_mtb_nrtr | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.0/en/rec_mtb_nrtr_train.tar) |
+|SAR|Resnet31| 87.2% | rec_r31_sar | [Download link](https://paddleocr.bj.bcebos.com/dygraph_v2.1/rec/rec_r31_sar_train.tar) |
 
 Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./recognition_en.md)
diff --git a/doc/doc_en/recognition_en.md b/doc/doc_en/recognition_en.md
@@ -91,6 +91,8 @@ Similar to the training set, the test set also needs to be provided a folder con
 If you do not have a dataset locally, you can download it on the official website [icdar2015](http:https://rrc.cvc.uab.es/?ch=4&com=downloads).
 Also refer to [DTRB](https://github.com/clovaai/deep-text-recognition-benchmark#download-lmdb-dataset-for-traininig-and-evaluation-from-here) ，download the lmdb format dataset required for benchmark
 
+If you want to reproduce the paper SAR, you need to download extra dataset [SynthAdd](https://pan.baidu.com/share/init?surl=uV0LtoNmcxbO-0YA7Ch4dg), extraction code: 627x. Besides, icdar2013, icdar2015, cocotext, IIIT5k datasets are also used to train. For specific details, please refer to the paper SAR.
+
 PaddleOCR provides label files for training the icdar2015 dataset, which can be downloaded in the following ways:
 
 ```
@@ -235,6 +237,8 @@ If the evaluation set is large, the test will be time-consuming. It is recommend
 | rec_r34_vd_tps_bilstm_att.yml | CRNN | Resnet34_vd | TPS | BiLSTM | att |
 | rec_r50fpn_vd_none_srn.yml | SRN | Resnet50_fpn_vd | None | rnn | srn |
 | rec_mtb_nrtr.yml | NRTR | nrtr_mtb | None | transformer encoder | transformer decoder |
+| rec_r31_sar.yml | SAR | ResNet31 | None | LSTM encoder | LSTM decoder |
+
 
 For training Chinese data, it is recommended to use
 [rec_chinese_lite_train_v2.0.yml](../../configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml). If you want to try the result of other algorithms on the Chinese data set, please refer to the following instructions to modify the configuration file:

diff --git a/ppocr/data/imaug/__init__.py b/ppocr/data/imaug/__init__.py
@@ -21,7 +21,7 @@
 from .make_shrink_map import MakeShrinkMap
 from .random_crop_data import EastRandomCropData, PSERandomCrop
 
-from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, SRNRecResizeImg, NRTRRecResizeImg
+from .rec_img_aug import RecAug, RecResizeImg, ClsResizeImg, SRNRecResizeImg, NRTRRecResizeImg, SARRecResizeImg
 from .randaugment import RandAugment
 from .copy_paste import CopyPaste
 from .operators import *

diff --git a/ppocr/data/imaug/label_ops.py b/ppocr/data/imaug/label_ops.py
@@ -549,3 +549,49 @@ def get_beg_end_flag_idx(self, beg_or_end, char_or_elem):
  assert False, "Unsupport type %s in char_or_elem" \
  % char_or_elem
  return idx
+
+
+class SARLabelEncode(BaseRecLabelEncode):
+ """ Convert between text-label and text-index """
+
+ def __init__(self,
+ max_text_length,
+ character_dict_path=None,
+ character_type='ch',
+ use_space_char=False,
+ **kwargs):
+ super(SARLabelEncode,
+ self).__init__(max_text_length, character_dict_path,
+ character_type, use_space_char)
+
+ def add_special_char(self, dict_character):
+ beg_end_str = "<BOS/EOS>"
+ unknown_str = "<UKN>"
+ padding_str = "<PAD>"
+ dict_character = dict_character + [unknown_str]
+ self.unknown_idx = len(dict_character) - 1
+ dict_character = dict_character + [beg_end_str]
+ self.start_idx = len(dict_character) - 1
+ self.end_idx = len(dict_character) - 1
+ dict_character = dict_character + [padding_str]
+ self.padding_idx = len(dict_character) - 1
+
+ return dict_character
+
+ def __call__(self, data):
+ text = data['label']
+ text = self.encode(text)
+ if text is None:
+ return None
+ if len(text) >= self.max_text_len - 1:
+ return None
+ data['length'] = np.array(len(text))
+ target = [self.start_idx] + text + [self.end_idx]
+ padded_text = [self.padding_idx for _ in range(self.max_text_len)]
+
+ padded_text[:len(target)] = target
+ data['label'] = np.array(padded_text)
+ return data
+
+ def get_ignored_tokens(self):
+ return [self.padding_idx]
diff --git a/ppocr/data/imaug/rec_img_aug.py b/ppocr/data/imaug/rec_img_aug.py
@@ -102,6 +102,56 @@ def __call__(self, data):
  return data
 
 
+class SARRecResizeImg(object):
+ def __init__(self, image_shape, width_downsample_ratio=0.25, **kwargs):
+ self.image_shape = image_shape
+ self.width_downsample_ratio = width_downsample_ratio
+
+ def __call__(self, data):
+ img = data['image']
+ norm_img, resize_shape, pad_shape, valid_ratio = resize_norm_img_sar(img, self.image_shape, self.width_downsample_ratio)
+ data['image'] = norm_img
+ data['resized_shape'] = resize_shape
+ data['pad_shape'] = pad_shape
+ data['valid_ratio'] = valid_ratio
+ return data
+
+
+def resize_norm_img_sar(img, image_shape, width_downsample_ratio=0.25):
+ imgC, imgH, imgW_min, imgW_max = image_shape
+ h = img.shape[0]
+ w = img.shape[1]
+ valid_ratio = 1.0
+ # make sure new_width is an integral multiple of width_divisor.
+ width_divisor = int(1 / width_downsample_ratio)
+ # resize
+ ratio = w / float(h)
+ resize_w = math.ceil(imgH * ratio)
+ if resize_w % width_divisor != 0:
+ resize_w = round(resize_w / width_divisor) * width_divisor
+ if imgW_min is not None:
+ resize_w = max(imgW_min, resize_w)
+ if imgW_max is not None:
+ valid_ratio = min(1.0, 1.0 * resize_w / imgW_max)
+ resize_w = min(imgW_max, resize_w)
+ resized_image = cv2.resize(img, (resize_w, imgH))
+ resized_image = resized_image.astype('float32')
+ # norm 
+ if image_shape[0] == 1:
+ resized_image = resized_image / 255
+ resized_image = resized_image[np.newaxis, :]
+ else:
+ resized_image = resized_image.transpose((2, 0, 1)) / 255
+ resized_image -= 0.5
+ resized_image /= 0.5
+ resize_shape = resized_image.shape
+ padding_im = -1.0 * np.ones((imgC, imgH, imgW_max), dtype=np.float32)
+ padding_im[:, :, 0:resize_w] = resized_image
+ pad_shape = padding_im.shape
+
+ return padding_im, resize_shape, pad_shape, valid_ratio
+
+
 def resize_norm_img(img, image_shape):
  imgC, imgH, imgW = image_shape
  h = img.shape[0]

diff --git a/ppocr/losses/__init__.py b/ppocr/losses/__init__.py
@@ -26,6 +26,7 @@
 from .rec_att_loss import AttentionLoss
 from .rec_srn_loss import SRNLoss
 from .rec_nrtr_loss import NRTRLoss
+from .rec_sar_loss import SARLoss
 # cls loss
 from .cls_loss import ClsLoss
 
@@ -44,7 +45,7 @@
 def build_loss(config):
  support_dict = [
  'DBLoss', 'EASTLoss', 'SASTLoss', 'CTCLoss', 'ClsLoss', 'AttentionLoss',
- 'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss', 'TableAttentionLoss'
+ 'SRNLoss', 'PGLoss', 'CombinedLoss', 'NRTRLoss', 'TableAttentionLoss', 'SARLoss'
  ]
 
  config = copy.deepcopy(config)

diff --git a/ppocr/losses/rec_sar_loss.py b/ppocr/losses/rec_sar_loss.py
@@ -0,0 +1,25 @@
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import paddle
+from paddle import nn
+
+
+class SARLoss(nn.Layer):
+ def __init__(self, **kwargs):
+ super(SARLoss, self).__init__()
+ self.loss_func = paddle.nn.loss.CrossEntropyLoss(reduction="mean", ignore_index=96)
+
+ def forward(self, predicts, batch):
+ predict = predicts[:, :-1, :] # ignore last index of outputs to be in same seq_len with targets
+ label = batch[1].astype("int64")[:, 1:] # ignore first index of target in loss calculation
+ batch_size, num_steps, num_classes = predict.shape[0], predict.shape[
+ 1], predict.shape[2]
+ assert len(label.shape) == len(list(predict.shape)) - 1, \
+ "The target's shape and inputs's shape is [N, d] and [N, num_steps]"
+
+ inputs = paddle.reshape(predict, [-1, num_classes])
+ targets = paddle.reshape(label, [-1])
+ loss = self.loss_func(inputs, targets)
+ return {'loss': loss}
diff --git a/ppocr/modeling/backbones/__init__.py b/ppocr/modeling/backbones/__init__.py
@@ -27,8 +27,9 @@ def build_backbone(config, model_type):
  from .rec_resnet_fpn import ResNetFPN
  from .rec_mv1_enhance import MobileNetV1Enhance
  from .rec_nrtr_mtb import MTB
+ from .rec_resnet_31 import ResNet31
  support_dict = [
- 'MobileNetV1Enhance', 'MobileNetV3', 'ResNet', 'ResNetFPN', 'MTB'
+ 'MobileNetV1Enhance', 'MobileNetV3', 'ResNet', 'ResNetFPN', 'MTB', "ResNet31"
  ]
  elif model_type == "e2e":
  from .e2e_resnet_vd_pg import ResNet