update some en docs

wickai · Dec 7, 2021 · 49dcb5c · 49dcb5c
1 parent 098a934
commit 49dcb5c
Show file tree

Hide file tree

Showing 27 changed files with 347 additions and 108 deletions.
diff --git a/docs/en/ImageNet_models_en.md → ...orithm_introduction/ImageNet_models_en.md b/docs/en/ImageNet_models_en.md → ...orithm_introduction/ImageNet_models_en.md
diff --git a/docs/en/models/DLA.md → docs/en/models/DLA_en.md b/docs/en/models/DLA.md → docs/en/models/DLA_en.md
@@ -1,11 +1,17 @@
 # DLA series
+---
+## Catalogue
 
+* [1. Overview](#1)
+* [2. Accuracy, FLOPS and Parameters](#2)
+
+<a name='1'></a>
 ## Overview
 
 DLA (Deep Layer Aggregation). Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Although skip connections have been incorporated to combine layers, these connections have been "shallow" themselves, and only fuse by simple, one-step operations. The authors augment standard architectures with deeper aggregation to better fuse information across layers. Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. [paper](https://arxiv.org/abs/1707.06484)
 
-
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPS and Parameters
 
 | Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
 |:-----------------:|:----------:|:---------:|:---------:|:---------:|

diff --git a/docs/en/models/DPN_DenseNet_en.md b/docs/en/models/DPN_DenseNet_en.md
@@ -1,6 +1,14 @@
 # DPN and DenseNet series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPs and Parameters](#2)
+* [3. Inference speed based on V100 GPU](#3)
+* [4. Inference speed based on T4 GPU](#4)
+
+<a name='1'></a>
+## 1. Overview
 
 DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge. The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPS and parameters.
 
@@ -18,10 +26,10 @@ The pretrained models of these two types of models (a total of 10) are open sour
 
 For DPN series networks, the larger the model's FLOPs and parameters, the higher the model's accuracy. Among them, since the width of DPN107 is the largest, it has the largest number of parameters and FLOPs in this series of networks.
 
+<a name='2'></a>
+## 2. Accuracy, FLOPs and Parameters
 
-## Accuracy, FLOPS and Parameters
-
-| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
+| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
 | DenseNet121 | 0.757 | 0.926 | 0.750 | | 5.690 | 7.980 |
 | DenseNet161 | 0.786 | 0.941 | 0.778 | | 15.490 | 28.680 |
@@ -36,8 +44,8 @@ For DPN series networks, the larger the model's FLOPs and parameters, the higher
 
 
 
-
-## Inference speed based on V100 GPU
+<a name='3'></a>
+## 3. Inference speed based on V100 GPU
 
 | Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------|-----------|-------------------|--------------------------|
@@ -53,8 +61,8 @@ For DPN series networks, the larger the model's FLOPs and parameters, the higher
 | DPN131 | 224 | 256 | 28.083 |
 
 
-
-## Inference speed based on T4 GPU
+<a name='4'></a>
+## 4. Inference speed based on T4 GPU
 
 | Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
 |-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|

diff --git a/docs/en/models/ESNet_en.md b/docs/en/models/ESNet_en.md
@@ -0,0 +1,23 @@
+# ESNet Series
+---
+## Catalogue
+
+* [1. Overview](#1)
+* [2. Accuracy, FLOPS and Parameters](#2)
+
+<a name='1'></a>
+## 1. Overview
+
+ESNet (Enhanced ShuffleNet) is a lightweight network developed by Baidu. This network combines the advantages of MobileNetV3, GhostNet, and PPLCNet on the basis of ShuffleNetV2 to form a faster and more accurate network on ARM devices, Because of its excellent performance, [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) launched in PaddleDetection uses this model as a backbone, with stronger object detection algorithm, the final mAP index refreshed the SOTA index of the object detection model on the ARM device in one fell swoop.
+
+<a name='2'></a>
+## 2. Accuracy, FLOPS and Parameters
+
+| Models | Top1 | Top5 | FLOPs<br>(M) | Params<br/>(M) |
+|:--:|:--:|:--:|:--:|:--:|
+| ESNet_x0_25 | 62.48 | 83.46 | 30.9 | 2.83 |
+| ESNet_x0_5 | 68.82 | 88.04 | 67.3 | 3.25 |
+| ESNet_x0_75 | 72.24 | 90.45 | 123.7 | 3.87 |
+| ESNet_x1_0 | 73.92 | 91.40 | 197.3 | 4.64 |
+
+Please stay tuned for information such as Inference speed.
diff --git a/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md b/docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md
@@ -1,6 +1,14 @@
 # EfficientNet and ResNeXt101_wsl series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPS and Parameters](#2)
+* [3. Inference speed based on V100 GPU](#3)
+* [4. Inference speed based on T4 GPU](#4)
+
+<a name='1'></a>
+## 1. Overview
 
 EfficientNet is a lightweight NAS-based network released by Google in 2019. EfficientNetB7 refreshed the classification accuracy of ImageNet-1k at that time. In this paper, the author points out that the traditional methods to improve the performance of neural networks mainly start with the width of the network, the depth of the network, and the resolution of the input picture.
 However, the author found that balancing these three dimensions is essential for improving accuracy and efficiency through experiments.
@@ -21,7 +29,8 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models
 
 At present, there are a total of 14 pretrained models of the two types of models that PaddleClas open source. It can be seen from the above figure that the advantages of the EfficientNet series network are very obvious. The ResNeXt101_wsl series model uses more data, and the final accuracy is also higher. EfficientNet_B0_small removes SE_block based on EfficientNet_B0, which has faster inference speed.
 
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPS and Parameters
 
 | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
@@ -40,8 +49,8 @@ At present, there are a total of 14 pretrained models of the two types of models
 | EfficientNetB7 | 0.843 | 0.969 | 0.844 | 0.971 | 72.350 | 64.920 |
 | EfficientNetB0_<br>small | 0.758 | 0.926 | | | 0.720 | 4.650 |
 
-
-## Inference speed based on V100 GPU
+<a name='3'></a>
+## 3. Inference speed based on V100 GPU
 
 | Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------------------------|-----------|-------------------|--------------------------|
@@ -61,8 +70,8 @@ At present, there are a total of 14 pretrained models of the two types of models
 | EfficientNetB0_<br>small | 224 | 256 | 1.692 |
 
 
-
-## Inference speed based on T4 GPU
+<a name='4'></a>
+## 4. Inference speed based on T4 GPU
 
 | Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
 |---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|

diff --git a/docs/en/models/HRNet_en.md b/docs/en/models/HRNet_en.md
@@ -1,6 +1,14 @@
 # HRNet series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPS and Parameters](#2)
+* [3. Inference speed based on V100 GPU](#3)
+* [4. Inference speed based on T4 GPU](#4)
+
+<a name='1'></a>
+## 1. Overview
 
 HRNet is a brand new neural network proposed by Microsoft research Asia in 2019. Different from the previous convolutional neural network, this network can still maintain high resolution in the deep layer of the network, so the heat map of the key points predicted is more accurate, and it is also more accurate in space. In addition, the network performs particularly well in other visual tasks sensitive to resolution, such as detection and segmentation.
 
@@ -16,8 +24,8 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models
 
 At present, there are 7 pretrained models of such models open-sourced by PaddleClas, and their indicators are shown in the figure. Among them, the reason why the accuracy of the HRNet_W48_C indicator is abnormal may be due to fluctuations in training.
 
-
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPS and Parameters
 
 | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
@@ -32,8 +40,8 @@ At present, there are 7 pretrained models of such models open-sourced by PaddleC
 | HRNet_W64_C | 0.793 | 0.946 | 0.795 | 0.946 | 57.830 | 128.060 |
 | SE_HRNet_W64_C_ssld | 0.847 | 0.973 | | | 57.830 | 128.970 |
 
-
-## Inference speed based on V100 GPU
+<a name='3'></a>
+## 3. Inference speed based on V100 GPU
 
 | Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |-------------|-----------|-------------------|--------------------------|
@@ -49,8 +57,8 @@ At present, there are 7 pretrained models of such models open-sourced by PaddleC
 
 
 
-
-## Inference speed based on T4 GPU
+<a name='4'></a>
+## 4. Inference speed based on T4 GPU
 
 | Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
 |-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|

diff --git a/docs/en/models/HarDNet.md → docs/en/models/HarDNet_en.md b/docs/en/models/HarDNet.md → docs/en/models/HarDNet_en.md
@@ -1,10 +1,17 @@
 # HarDNet series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPS and Parameters](#2)
+
+<a name='1'></a>
+## 1. Overview
 
 HarDNet（Harmonic DenseNet）is a brand new neural network proposed by National Tsing Hua University in 2019, which to achieve high efficiency in terms of both low MACs and memory traffic. The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. We use tools including Nvidia profiler and ARM Scale-Sim to measure the memory traffic and verify that the inference latency is indeed proportional to the memory traffic consumption and the proposed network consumes low memory traffic. [Paper](https://arxiv.org/abs/1909.00948).
 
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPS and Parameters
 
 | Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
 |:---------------------:|:----------:|:---------:|:---------:|:---------:|

diff --git a/docs/en/models/Inception_en.md b/docs/en/models/Inception_en.md
@@ -1,6 +1,14 @@
 # Inception series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPS and Parameters](#2)
+* [3. Inference speed based on V100 GPU](#3)
+* [4. Inference speed based on T4 GPU](#4)
+
+<a name='1'></a>
+## 1. Overview
 
 GoogLeNet is a new neural network structure designed by Google in 2014, which, together with VGG network, became the twin champions of the ImageNet challenge that year. GoogLeNet introduces the Inception structure for the first time, and stacks the Inception structure in the network so that the number of network layers reaches 22, which is also the mark of the convolutional network exceeding 20 layers for the first time. Since 1x1 convolution is used in the Inception structure to reduce the dimension of channel number, and Global pooling is used to replace the traditional method of processing features in multiple fc layers, the final GoogLeNet network has much less FLOPS and parameters than VGG network, which has become a beautiful scenery of neural network design at that time.
 
@@ -22,8 +30,8 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models
 
 The figure above reflects the relationship between the accuracy of Xception series and InceptionV4 and other indicators. Among them, Xception_deeplab is consistent with the structure of the paper, and Xception is an improved model developed by PaddleClas, which improves the accuracy by about 0.6% when the inference speed is basically unchanged. Details of the improved model are being updated, so stay tuned.
 
-
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPS and Parameters
 
 | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|
@@ -37,8 +45,8 @@ The figure above reflects the relationship between the accuracy of Xception seri
 | InceptionV4 | 0.808 | 0.953 | 0.800 | 0.950 | 24.570 | 42.680 |
 
 
-
-## Inference speed based on V100 GPU
+<a name='3'></a>
+## 3. Inference speed based on V100 GPU
 
 | Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
 |------------------------|-----------|-------------------|--------------------------|
@@ -51,8 +59,8 @@ The figure above reflects the relationship between the accuracy of Xception seri
 | InceptionV4 | 299 | 320 | 11.141 |
 
 
-
-## Inference speed based on T4 GPU
+<a name='4'></a>
+## 4. Inference speed based on T4 GPU
 
 | Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
 |--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|

diff --git a/docs/en/models/LeViT_en.md b/docs/en/models/LeViT_en.md
@@ -1,9 +1,16 @@
 # LeViT series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPS and Parameters](#2)
+
+<a name='1'></a>
+## 1. Overview
 LeViT is a fast inference hybrid neural network for image classification tasks. Its design considers the performance of the network model on different hardware platforms, so it can better reflect the real scenarios of common applications. Through a large number of experiments, the author found a better way to combine the convolutional neural network and the Transformer system, and proposed an attention-based method to integrate the position information encoding in the Transformer. [Paper](https://arxiv.org/abs/2104.01136)。
 
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPS and Parameters
 
 | Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(M) | Params<br>(M) |
 |:--:|:--:|:--:|:--:|:--:|:--:|:--:|

diff --git a/docs/en/models/MixNet_en.md b/docs/en/models/MixNet_en.md
@@ -1,6 +1,12 @@
 # MixNet series
+---
+## Catalogue
 
-## Overview
+* [1. Overview](#1)
+* [2. Accuracy, FLOPS and Parameters](#2)
+
+<a name='1'></a>
+## 1. Overview
 
 MixNet is a lightweight network proposed by Google. The main idea of MixNet is to explore the combination of different size of kernels. The author found that the current network has the following two problems:
 
@@ -9,7 +15,8 @@ MixNet is a lightweight network proposed by Google. The main idea of MixNet is t
 
  In order to solve the above two problems, MDConv(mixed depthwise convolution) is proposed. In this method, different size of kernels are mixed in a convolution operation block. And based on AutoML, a series of networks called MixNets are proposed, which have achieved good results on Imagenet. [paper](https://arxiv.org/pdf/1907.09595.pdf)
 
-## Accuracy, FLOPS and Parameters
+<a name='2'></a>
+## 2. Accuracy, FLOPS and Parameters
 
 | Models | Top1 | Top5 | Reference<br>top1 | FLOPS<br>(M) | Params<br/>(G |
 | :------: | :---: | :---: | :---------------: | :----------: | ------------- |