Skip to content

Commit

Permalink
[Update] Update MAE Pretraining part
Browse files Browse the repository at this point in the history
  • Loading branch information
Mountchicken committed Jun 2, 2023
1 parent a2c174f commit 0b37d73
Show file tree
Hide file tree
Showing 9 changed files with 185 additions and 54 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,6 @@ baselines/
dataset_funcs/
mmocr-dev-1.x/work_dirs
add_data/
mmocr-0.x/
mmocr-0.x/
mae/output_dir
*.pyc
90 changes: 60 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,24 @@
/# Union14M Dataset
<div align=center

# Rethinking Scene Text Recognition: A Data Perspective

</div>
<div align=center>
<img src='github/cover.png' width=600 >
</div>
<div align=center>
<p >Union14M is a large scene text recognition (STR) dataset collected from 17 publicly available datasets, which contains 4M of labeled data (Union14M-L) and 10M of unlabeled data (Union14M-U), intended to provide a more profound analysis for the STR community</p>

<div align=center>

[![arXiv preprint](http:https://img.shields.io/badge/arXiv-2207.06966-b31b1b)](https://arxiv.org/abs/2207.06966) [![Gradio demo](https://img.shields.io/badge/%F0%9F%A4%97%20demo-Gradio-ff7c00)](https://huggingface.co/spaces/baudm/PARSeq-OCR) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/bipinKrishnan/fastai_course/blob/master/bear_classifier.ipynb)


</div>


</div>
<p align="center">
<strong><a href="#sota">arXiv </a></strong> •
<strong><a href="#1-introduction">Introduction </a></strong> •
<strong><a href="#34-download">Download </a></strong> •
<strong><a href="#5-maerec">MAERec</a></strong> •
Expand All @@ -26,21 +37,26 @@
- To explore the challenges that STR models still face, we consolidate a large-scale STR dataset for analysis and identified seven open challenges. Furthermore, we propose a challenge-driven benchmark to facilitate the future development of STR. Additionally, we reveal that the utilization of massive unlabeled data through self-supervised pre-training can remarkably enhance the performance of the STR model in real-world scenarios, suggesting a practical solution for STR from a data perspective. We hope this work can spark future research beyond the realm of existing data paradigms.

## 2. Contents
- [1. Introduction](#1-introduction)
- [2. Contents](#2-contents)
- [3. Union14M Dataset](#3-union14m-dataset)
- [3.1. Union14M-L](#31-union14m-l)
- [3.2. Union14M-U](#32-union14m-u)
- [3.3. Union14M-Benchmark](#33-union14m-benchmark)
- [3.4. Download](#34-download)
- [4. STR Models trained on Union14M-L](#4-str-models-trained-on-union14m-l)
- [4.1. Checkpoints](#41-checkpoints)
- [5. MAERec](#5-maerec)
- [5.1. Pre-training](#51-pre-training)
- [5.2. Fine-tuning](#52-fine-tuning)
- [5.3 Inferencing](#53-inferencing)
- [6. QAs](#6-qas)
- [7. License](#7-license)
- [Rethinking Scene Text Recognition: A Data Perspective](#rethinking-scene-text-recognition-a-data-perspective)
- [1. Introduction](#1-introduction)
- [2. Contents](#2-contents)
- [3. Union14M Dataset](#3-union14m-dataset)
- [3.1. Union14M-L](#31-union14m-l)
- [3.2. Union14M-U](#32-union14m-u)
- [3.3. Union14M-Benchmark](#33-union14m-benchmark)
- [3.4. Download](#34-download)
- [4. STR Models trained on Union14M-L](#4-str-models-trained-on-union14m-l)
- [4.1. Checkpoints](#41-checkpoints)
- [5. MAERec](#5-maerec)
- [5.1. Pre-training](#51-pre-training)
- [5.2. Fine-tuning](#52-fine-tuning)
- [5.3. Evaluation](#53-evaluation)
- [5.4. Inferencing](#54-inferencing)
- [5.4. ONNX Conversion](#54-onnx-conversion)
- [6. QAs](#6-qas)
- [7. License](#7-license)
- [8. Acknowledgement](#8-acknowledgement)
- [9. Citation](#9-citation)

## 3. Union14M Dataset
### 3.1. Union14M-L
Expand Down Expand Up @@ -73,6 +89,7 @@
| Union14M-U (36.63GB) | [Google Drive (8 GB)]() | [Baidu Netdisk]() |
| 6 Common Benchmarks (17.6MB) | [Google Drive (8 GB)]() | [Baidu Netdisk](https://pan.baidu.com/s/1XifQS0v-0YxEXkGTfWMDWQ?pwd=35cz) |

<!-- TODO: Add Google Drive Links -->

- The Structure of Union14M will be organized as follows:

Expand Down Expand Up @@ -109,7 +126,7 @@
<details close>
<summary><strong>Structure of Union14M-U</strong></summary>

We store images in LMDB format, and the structure of Union14M-U will be organized as belows. Here is an example of using [LMDB Example]()
We store images in [LMDB](https://github.com/Mountchicken/Efficient-Deep-Learning/blob/main/Efficient_DataProcessing.md#21-efficient-data-storage-methods) format, and the structure of Union14M-U will be organized as belows. Here is an example of using [LMDB Example]()
```text
|--Union14M-U
|--book32_lmdb
Expand All @@ -122,7 +139,7 @@
- We train serval STR models on Union14M-L using [MMOCR-1.0](https://github.com/open-mmlab/mmocr/tree/dev-1.x)

### 4.1. Checkpoints
- Evaluated on both common benchmarks and Union14M-Benchmark. Accuracy (WAICS) in $\color{grey}{grey}$ are original implementation (Trained on synthtic datasest), and accuracay in $\color{green}{green}$ are trained on Union14M-L. Our models are trained to predict **upper & lower text, symbols and space.**
- Evaluated on both common benchmarks and Union14M-Benchmark. Accuracy (WAICS) in $\color{grey}{grey}$ are original implementation (Trained on synthtic datasest), and accuracay in $\color{green}{green}$ are trained on Union14M-L. All the re-trained models are trained to predict **upper & lower text, symbols and space.**

| Models | Checkpoint | IIIT5K | SVT | IC13-1015 | IC15-2077 | SVTP | CUTE80 | Avg. |
| :---------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------: | :--------------------------------------------: | :--------------------------------------------: | :--------------------------------------------: | :--------------------------------------------: | :--------------------------------------------: | :--------------------------------------------: |
Expand Down Expand Up @@ -155,29 +172,42 @@


### 5.1. Pre-training
- Pre-trained ViT
- ViT pretrained on Union14M-U.

| Variants | Input Size | Patch Size | Embedding | Depth | Heads | Parameters | Download |
| --------- | ---------- | ---------- | --------- | ----- | ----- | ---------- | --------------------------------------------------------------------------------------- |
| ViT-Small | 32x128 | 4x4 | 384 | 12 | 6 | | [Google Drive]() / [BaiduYun](https://pan.baidu.com/s/1nZL5veMyWhxpk8DGj0UZMw?pwd=xecv) |
| ViT-Base | 32x128 | 4x4 | 768 | 12 | 12 | | [Google Drive]() / [BaiduYun](https://pan.baidu.com/s/17CjAOV-1kf1__a2RBo9NUg?pwd=3rvx) |
| Variants | Input Size | Patch Size | Embedding | Depth | Heads | Parameters | Download |
| -------- | ---------- | ---------- | --------- | ----- | ----- | ---------- | --------------------------------------------------------------------------------------- |
| ViT-S | 32x128 | 4x4 | 384 | 12 | 6 | 21M | [Google Drive]() / [BaiduYun](https://pan.baidu.com/s/1nZL5veMyWhxpk8DGj0UZMw?pwd=xecv) |
| ViT-B | 32x128 | 4x4 | 768 | 12 | 12 | 85M | [Google Drive]() / [BaiduYun](https://pan.baidu.com/s/17CjAOV-1kf1__a2RBo9NUg?pwd=3rvx) |
- If you want to pre-train the ViT backbone on your own dataset, check [pre-training](docs/pretrain.md)

<!-- TODO: Add Google Drive Link -->

### 5.2. Fine-tuning
- Fine-tuned MAERec
- MAERec finetuned on Union14M-L

| Variants | Acc on Common Benchmarks | Acc on Union14M-Benchmarks | Download |
| ------------ | ------------------------ | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| MAERec-Small | 95.1 | 78.6 | [Google Drive](https://drive.google.com/file/d/1dKLS_r3_ysWK155pSmkm7NBf5ALsEJYd/view?usp=sharing) / [BaiduYun](https://pan.baidu.com/s/1wFhLQLrn9dm77TMpdxyNAg?pwd=trg4) |
| MAERec-Base | 96.2 | 85.2 | [Google Drive](https://drive.google.com/file/d/13E0cmvksKwvjNuR62xZhwkg8eQJfb_Hp/view?usp=sharing) / [BaiduYun](https://pan.baidu.com/s/1EhoJ-2WqkzOQFCNg55-KcA?pwd=5yx1) |
| Variants | Acc on Common Benchmarks | Acc on Union14M-Benchmarks | Download |
| -------- | ------------------------ | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| MAERec-S | 95.1 | 78.6 | [Google Drive](https://drive.google.com/file/d/1dKLS_r3_ysWK155pSmkm7NBf5ALsEJYd/view?usp=sharing) / [BaiduYun](https://pan.baidu.com/s/1wFhLQLrn9dm77TMpdxyNAg?pwd=trg4) |
| MAERec-B | 96.2 | 85.2 | [Google Drive](https://drive.google.com/file/d/13E0cmvksKwvjNuR62xZhwkg8eQJfb_Hp/view?usp=sharing) / [BaiduYun](https://pan.baidu.com/s/1EhoJ-2WqkzOQFCNg55-KcA?pwd=5yx1) |

- If you want to fine-tune MAERec on your own dataset, check [fine-tuning](docs/finetune.md)

### 5.3 Inferencing
### 5.3. Evaluation
- If you want to evaluate MAERec on benchmarks, check [evaluation](docs/evaluation.md)

### 5.4. Inferencing
- If you want to inferencing MAERec on your raw pictures, check [inferencing](docs/inferencing.md)


### 5.4. ONNX Conversion

## 6. QAs


## 7. License
- The repository is released under the [MIT license](LICENSE).

## 8. Acknowledgement
- We sincerely thank all the constructors of the 17 datasets used in Union14M, and also the developers of MMOCR, which is a powerful toolbox for OCR research.

## 9. Citation
60 changes: 43 additions & 17 deletions docs/pretrain.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,54 @@
## Pre-training Using MAE
We adopt the framework of [MAE](http:https://openaccess.thecvf.com/content/CVPR2022/html/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.html) for pre-training. The code is heavily borrowed from [Masked Autoencoders: A PyTorch Implementation](https://github.com/facebookresearch/mae).

### 1. Install
### 1. Installation
```bash
conda create -n mae python=3.7
cd mae/
conda create -n mae python=3.8
conda activate mae
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
```
- **Attention**: This repo is based on `timm==0.3.2`, for which a [fix](https://github.com/huggingface/pytorch-image-models/issues/420#issuecomment-776459842) is needed to work with PyTorch 1.8.1+.
- **Attention**: The pre-training code is based on `timm==0.3.2`, for which a [fix](https://github.com/huggingface/pytorch-image-models/issues/420#issuecomment-776459842) is needed to work with PyTorch 1.8.1+. Add the below code to `timm/models/layers/helpers.py`:
```python
import torch

### 2. Prepare dataset
- You need to prepare the dataset(s) in torchvision.datasets.ImageFolder format. The basic structure of the dataset is as follows:
```text
|--dataset
|--subfolder1
|--image1.jpg
|--image2.jpg
|--...
|--subfolder2
|--image1.jpg
|--image2.jpg
|--...
TORCH_MAJOR = int(torch.__version__.split('.')[0])
TORCH_MINOR = int(torch.__version__.split('.')[1])

if TORCH_MAJOR == 1 and TORCH_MINOR < 8:
from torch._six import container_abcs
else:
import collections.abc as container_abcs
```
- You can aslo use Union14M-U for pre-training, which is organized in ImageFolder format.

### 2. Prepare dataset
- We support two types of datasets: ImageFolder and LMDB.
- torchvision.datasets.ImageFolder format:
```text
|--dataset
|--book32
|--image1.jpg
|--image2.jpg
|--...
|--openvino
|--image1.jpg
|--image2.jpg
|--...
```
- LMDB format. To know more about LMDB structure and how to create LMDB, you should not miss this [repo](https://github.com/Mountchicken/Efficient-Deep-Learning/blob/main/Efficient_DataProcessing.md#21-efficient-data-storage-methods).
```text
|--dataset
|--book32
|--data.mdb
|--lock.mdb
|--openvino
|--data.mdb
|--lock.mdb
|--cc
|--data.mdb
|--lock.mdb
```

### 3. Pre-training
- Pre-training ViT-Small on Union14M-U with 4 gpus:
Expand All @@ -38,8 +63,9 @@ pip install -r requirements.txt
--norm_pix_loss \
--blr 1.5e-4 \
--weight_decay 0.05 \
--data_path Union14M-U/book32 Union14M-U/openvino /Union14M-U/CC
--data_path ../data/Union14M-U/book32_lmdb ../data/Union14M-U/cc_lmdb ../data/Union14M-U/openvino_lmdb
```
- To pretrain ViT-Base, use `--model mae_vit_base_patch4`.
- Here the effective batch size is 256 (batch_size per gpu) * 1 (nodes) * 4 (gpus per node) = 1024. If memory or # gpus is limited, use --accum_iter to maintain the effective batch size, which is batch_size (per gpu) * nodes * 8 (gpus per node) * accum_iter.
- Here we use --norm_pix_loss as the target for better representation learning. To train a baseline model (e.g., for visualization), use pixel-based construction and turn off --norm_pix_loss.
- To train ViT-Base set --model mae_vit_base_patch4
Expand Down
56 changes: 56 additions & 0 deletions mae/datasets/lmdb_dataset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import lmdb
import sys
import six
from torch.utils.data import Dataset
from PIL import Image


class lmdbDataset(Dataset):
"""LMDB dataset for raw images.
Args:
root (str): Root path for lmdb files.
transform (callable, optional): A function/transform that takes in an
PIL image and returns a transformed version.
"""

def __init__(self, root: str = None, transform=None):
self.env = lmdb.open(
root,
max_readers=1,
readonly=True,
lock=False,
readahead=False,
meminit=False)

if not self.env:
print('cannot creat lmdb from %s' % (root))
sys.exit(0)

with self.env.begin(write=False) as txn:
nSamples = int(txn.get('num-samples'.encode()))
self.nSamples = nSamples
self.transform = transform

def __len__(self):
return self.nSamples

def __getitem__(self, index):
assert index <= len(self), 'index range error'
index += 1
with self.env.begin(write=False) as txn:
img_key = 'image-%09d' % index
imgbuf = txn.get(img_key.encode())

buf = six.BytesIO()
buf.write(imgbuf)
buf.seek(0)
try:
img = Image.open(buf).convert('RGB')
except IOError:
print('Corrupted image for %d' % index)
return self[index + 1]

img = self.transform(img)

return img, 'test'
27 changes: 21 additions & 6 deletions mae/main_pretrain.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import util.misc as misc
from engine_pretrain import train_one_epoch
from util.misc import NativeScalerWithGradNormCount as NativeScaler
from datasets.lmdb_dataset import lmdbDataset

assert timm.__version__ == "0.3.2" # version check

Expand Down Expand Up @@ -172,15 +173,27 @@ def main(args):
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

# check if it is lmdb dataset
if isinstance(args.data_path, list):
dataset_train = datasets.ImageFolder(args.data_path[0],
transform_train)
files = os.listdir(args.data_path[0])
else:
files = os.listdir(args.data_path)
for f in files:
if '.mdb' in f:
dataset_type = lmdbDataset
break
if os.path.isdir(os.path.join(args.data_path, f)):
dataset_type = datasets.ImageFolder
break

if isinstance(args.data_path, list):
dataset_train = dataset_type(args.data_path[0], transform_train)
for p in args.data_path[1:]:
dataset_train = torch.utils.data.ConcatDataset(
[dataset_train,
datasets.ImageFolder(p, transform_train)])
dataset_type(p, transform_train)])
else:
dataset_train = datasets.ImageFolder(
dataset_train = dataset_type(
os.path.join(args.data_path), transform=transform_train)
print(dataset_train)

Expand Down Expand Up @@ -273,8 +286,10 @@ def main(args):
epoch=epoch)

log_stats = {
**{f'train_{k}': v
for k, v in train_stats.items()},
**{
f'train_{k}': v
for k, v in train_stats.items()
},
'epoch': epoch,
}

Expand Down
2 changes: 2 additions & 0 deletions mae/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
timm==0.3.2
tensorboard==2.11.0
lmdb==1.4.1
numpy<=1.23.0
Binary file modified mae/util/__pycache__/lr_sched.cpython-38.pyc
Binary file not shown.
Binary file modified mae/util/__pycache__/misc.cpython-38.pyc
Binary file not shown.
Binary file modified mae/util/__pycache__/pos_embed.cpython-38.pyc
Binary file not shown.

0 comments on commit 0b37d73

Please sign in to comment.