Merge pull request PaddlePaddle#1530 from cuicheng01/develop

update some en docs
wickai · Dec 9, 2021 · 2153033 · 2153033
2 parents 78a2645 + 03593f9
commit 2153033
Show file tree

Hide file tree

Showing 27 changed files with 375 additions and 136 deletions.
diff --git a/docs/en/ImageNet_models_en.md → ...orithm_introduction/ImageNet_models_en.md b/docs/en/ImageNet_models_en.md → ...orithm_introduction/ImageNet_models_en.md
diff --git a/docs/en/models/DLA.md → docs/en/models/DLA_en.md b/docs/en/models/DLA.md → docs/en/models/DLA_en.md
@@ -1,11 +1,17 @@
 # DLA series
+---
+## Catalogue
 
+* [1. Overview](#1)
+* [2. Accuracy, FLOPs and Parameters](#2)
+
+<a name='1'></a>
 ## Overview
 
 DLA (Deep Layer Aggregation). Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Although skip connections have been incorporated to combine layers, these connections have been "shallow" themselves, and only fuse by simple, one-step operations. The authors augment standard architectures with deeper aggregation to better fuse information across layers. Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. [paper](https://arxiv.org/abs/1707.06484)
 
-
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPs and Parameters
 
 | Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
 |:-----------------:|:----------:|:---------:|:---------:|:---------:|

diff --git a/docs/en/models/DPN_DenseNet_en.md b/docs/en/models/DPN_DenseNet_en.md
@@ -1,10 +1,18 @@
 # DPN and DenseNet series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPs and Parameters](#2)
+* [3. Inference speed based on V100 GPU](#3)
+* [4. Inference speed based on T4 GPU](#4)
 
-DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge. The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPS and parameters.
+<a name='1'></a>
+## 1. Overview
 
-The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
+DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge. The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPs and parameters.
+
+The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
 
 ![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png)
 
@@ -14,14 +22,14 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models
 
 ![](../../images/models/T4_benchmark/t4.fp16.bs4.DPN.png)
 
-The pretrained models of these two types of models (a total of 10) are open sourced in PaddleClas at present. The indicators are shown in the figure above. It is easy to observe that under the same FLOPS and parameters, DPN has higher accuracy than DenseNet. However,because DPN has more branches, its inference speed is slower than DenseNet. Since DenseNet264 has the deepest layers in all DenseNet networks, it has the largest parameters,DenseNet161 has the largest width, resulting the largest FLOPs and the highest accuracy in this series. From the perspective of inference speed, DenseNet161, which has a large FLOPs and high accuracy, has a faster speed than DenseNet264, so it has a greater advantage than DenseNet264.
+The pretrained models of these two types of models (a total of 10) are open sourced in PaddleClas at present. The indicators are shown in the figure above. It is easy to observe that under the same FLOPs and parameters, DPN has higher accuracy than DenseNet. However,because DPN has more branches, its inference speed is slower than DenseNet. Since DenseNet264 has the deepest layers in all DenseNet networks, it has the largest parameters,DenseNet161 has the largest width, resulting the largest FLOPs and the highest accuracy in this series. From the perspective of inference speed, DenseNet161, which has a large FLOPs and high accuracy, has a faster speed than DenseNet264, so it has a greater advantage than DenseNet264.
 
 For DPN series networks, the larger the model's FLOPs and parameters, the higher the model's accuracy. Among them, since the width of DPN107 is the largest, it has the largest number of parameters and FLOPs in this series of networks.
 
+<a name='2'></a>
+## 2. Accuracy, FLOPs and Parameters
 
-## Accuracy, FLOPS and Parameters
-
-| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
 | DenseNet121 | 0.757 | 0.926 | 0.750 | | 5.690 | 7.980 |
 | DenseNet161 | 0.786 | 0.941 | 0.778 | | 15.490 | 28.680 |
@@ -36,8 +44,8 @@ For DPN series networks, the larger the model's FLOPs and parameters, the higher
 
 
 
-
-## Inference speed based on V100 GPU
+<a name='3'></a>
+## 3. Inference speed based on V100 GPU
 
 | Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------|-----------|-------------------|--------------------------|
@@ -53,8 +61,8 @@ For DPN series networks, the larger the model's FLOPs and parameters, the higher
 | DPN131 | 224 | 256 | 28.083 |
 
 
-
-## Inference speed based on T4 GPU
+<a name='4'></a>
+## 4. Inference speed based on T4 GPU
 
 | Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
 |-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|

diff --git a/docs/en/models/ESNet_en.md b/docs/en/models/ESNet_en.md
@@ -0,0 +1,23 @@
+# ESNet Series
+---
+## Catalogue
+
+* [1. Overview](#1)
+* [2. Accuracy, FLOPs and Parameters](#2)
+
+<a name='1'></a>
+## 1. Overview
+
+ESNet (Enhanced ShuffleNet) is a lightweight network developed by Baidu. This network combines the advantages of MobileNetV3, GhostNet, and PPLCNet on the basis of ShuffleNetV2 to form a faster and more accurate network on ARM devices, Because of its excellent performance, [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) launched in PaddleDetection uses this model as a backbone, with stronger object detection algorithm, the final mAP index refreshed the SOTA index of the object detection model on the ARM device in one fell swoop.
+
+<a name='2'></a>
+## 2. Accuracy, FLOPs and Parameters
+
+| Models | Top1 | Top5 | FLOPs<br>(M) | Params<br/>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| ESNet_x0_25 | 62.48 | 83.46 | 30.9 | 2.83 |
+| ESNet_x0_5 | 68.82 | 88.04 | 67.3 | 3.25 |
+| ESNet_x0_75 | 72.24 | 90.45 | 123.7 | 3.87 |
+| ESNet_x1_0 | 73.92 | 91.40 | 197.3 | 4.64 |
+
+Please stay tuned for information such as Inference speed.
diff --git a/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md b/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md
@@ -1,15 +1,23 @@
 # EfficientNet and ResNeXt101_wsl series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPs and Parameters](#2)
+* [3. Inference speed based on V100 GPU](#3)
+* [4. Inference speed based on T4 GPU](#4)
+
+<a name='1'></a>
+## 1. Overview
 
 EfficientNet is a lightweight NAS-based network released by Google in 2019. EfficientNetB7 refreshed the classification accuracy of ImageNet-1k at that time. In this paper, the author points out that the traditional methods to improve the performance of neural networks mainly start with the width of the network, the depth of the network, and the resolution of the input picture.
 However, the author found that balancing these three dimensions is essential for improving accuracy and efficiency through experiments.
 Therefore, the author summarized how to balance the three dimensions at the same time through a series of experiments.
-At the same time, based on this scaling method, the author built a total of 7 networks B1-B7 in the EfficientNet series on the basis of EfficientNetB0, and with the same FLOPS and parameters, the accuracy reached state-of-the-art effect.
+At the same time, based on this scaling method, the author built a total of 7 networks B1-B7 in the EfficientNet series on the basis of EfficientNetB0, and with the same FLOPs and parameters, the accuracy reached state-of-the-art effect.
 
 ResNeXt is an improved version of ResNet that proposed by Facebook in 2016. In 2019, Facebook researchers studied the accuracy limit of the series network on ImageNet through weakly-supervised-learning. In order to distinguish the previous ResNeXt network, the suffix of this series network is WSL, where WSL is the abbreviation of weakly-supervised-learning. In order to have stronger feature extraction capability, the researchers further enlarged the network width, among which the largest ResNeXt101_32x48d_wsl has 800 million parameters. It was trained under 940 million weak-labeled images, and the results were finetune trained on imagenet-1k. Finally, the acc-1 of imagenet-1k reaches 85.4%, which is also the network with the highest precision under the resolution of 224x224 on imagenet-1k so far. In Fix-ResNeXt, the author used a larger image resolution, made a special Fix strategy for the inconsistency of image data preprocessing in training and testing, and made ResNeXt101_32x48d_wsl have a higher accuracy. Since it used the Fix strategy, it was named Fix-ResNeXt101_32x48d_wsl.
 
-The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
+The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
 
 ![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png)
 
@@ -21,9 +29,10 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models
 
 At present, there are a total of 14 pretrained models of the two types of models that PaddleClas open source. It can be seen from the above figure that the advantages of the EfficientNet series network are very obvious. The ResNeXt101_wsl series model uses more data, and the final accuracy is also higher. EfficientNet_B0_small removes SE_block based on EfficientNet_B0, which has faster inference speed.
 
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPs and Parameters
 
-| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
 | ResNeXt101_<br>32x8d_wsl | 0.826 | 0.967 | 0.822 | 0.964 | 29.140 | 78.440 |
 | ResNeXt101_<br>32x16d_wsl | 0.842 | 0.973 | 0.842 | 0.972 | 57.550 | 152.660 |
@@ -40,8 +49,8 @@ At present, there are a total of 14 pretrained models of the two types of models
 | EfficientNetB7 | 0.843 | 0.969 | 0.844 | 0.971 | 72.350 | 64.920 |
 | EfficientNetB0_<br>small | 0.758 | 0.926 | | | 0.720 | 4.650 |
 
-
-## Inference speed based on V100 GPU
+<a name='3'></a>
+## 3. Inference speed based on V100 GPU
 
 | Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------------------------|-----------|-------------------|--------------------------|
@@ -61,8 +70,8 @@ At present, there are a total of 14 pretrained models of the two types of models
 | EfficientNetB0_<br>small | 224 | 256 | 1.692 |
 
 
-
-## Inference speed based on T4 GPU
+<a name='4'></a>
+## 4. Inference speed based on T4 GPU
 
 | Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
 |---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|

diff --git a/docs/en/models/HRNet_en.md b/docs/en/models/HRNet_en.md
@@ -1,10 +1,18 @@
 # HRNet series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPs and Parameters](#2)
+* [3. Inference speed based on V100 GPU](#3)
+* [4. Inference speed based on T4 GPU](#4)
+
+<a name='1'></a>
+## 1. Overview
 
 HRNet is a brand new neural network proposed by Microsoft research Asia in 2019. Different from the previous convolutional neural network, this network can still maintain high resolution in the deep layer of the network, so the heat map of the key points predicted is more accurate, and it is also more accurate in space. In addition, the network performs particularly well in other visual tasks sensitive to resolution, such as detection and segmentation.
 
-The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
+The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
 
 ![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png)
 
@@ -16,10 +24,10 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models
 
 At present, there are 7 pretrained models of such models open-sourced by PaddleClas, and their indicators are shown in the figure. Among them, the reason why the accuracy of the HRNet_W48_C indicator is abnormal may be due to fluctuations in training.
 
+<a name='2'></a>
+## 2. Accuracy, FLOPs and Parameters
 
-## Accuracy, FLOPS and Parameters
-
-| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
 | HRNet_W18_C | 0.769 | 0.934 | 0.768 | 0.934 | 4.140 | 21.290 |
 | HRNet_W18_C_ssld | 0.816 | 0.958 | 0.768 | 0.934 | 4.140 | 21.290 |
@@ -32,8 +40,8 @@ At present, there are 7 pretrained models of such models open-sourced by PaddleC
 | HRNet_W64_C | 0.793 | 0.946 | 0.795 | 0.946 | 57.830 | 128.060 |
 | SE_HRNet_W64_C_ssld | 0.847 | 0.973 | | | 57.830 | 128.970 |
 
-
-## Inference speed based on V100 GPU
+<a name='3'></a>
+## 3. Inference speed based on V100 GPU
 
 | Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------|-----------|-------------------|--------------------------|
@@ -49,8 +57,8 @@ At present, there are 7 pretrained models of such models open-sourced by PaddleC
 
 
 
-
-## Inference speed based on T4 GPU
+<a name='4'></a>
+## 4. Inference speed based on T4 GPU
 
 | Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
 |-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|

diff --git a/docs/en/models/HarDNet.md → docs/en/models/HarDNet_en.md b/docs/en/models/HarDNet.md → docs/en/models/HarDNet_en.md
@@ -1,10 +1,17 @@
 # HarDNet series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPs and Parameters](#2)
+
+<a name='1'></a>
+## 1. Overview
 
 HarDNet（Harmonic DenseNet）is a brand new neural network proposed by National Tsing Hua University in 2019, which to achieve high efficiency in terms of both low MACs and memory traffic. The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. We use tools including Nvidia profiler and ARM Scale-Sim to measure the memory traffic and verify that the inference latency is indeed proportional to the memory traffic consumption and the proposed network consumes low memory traffic. [Paper](https://arxiv.org/abs/1909.00948).
 
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPs and Parameters
 
 | Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
 |:---------------------:|:----------:|:---------:|:---------:|:---------:|