Skip to content

Commit

Permalink
Merge pull request PaddlePaddle#1530 from cuicheng01/develop
Browse files Browse the repository at this point in the history
update some en docs
  • Loading branch information
cuicheng01 authored Dec 9, 2021
2 parents 78a2645 + 03593f9 commit 2153033
Show file tree
Hide file tree
Showing 27 changed files with 375 additions and 136 deletions.
File renamed without changes.
10 changes: 8 additions & 2 deletions docs/en/models/DLA.md → docs/en/models/DLA_en.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# DLA series
---
## Catalogue

* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)

<a name='1'></a>
## Overview

DLA (Deep Layer Aggregation). Visual recognition requires rich representations that span levels from low to high, scales from small to large, and resolutions from fine to coarse. Even with the depth of features in a convolutional network, a layer in isolation is not enough: compounding and aggregating these representations improves inference of what and where. Although skip connections have been incorporated to combine layers, these connections have been "shallow" themselves, and only fuse by simple, one-step operations. The authors augment standard architectures with deeper aggregation to better fuse information across layers. Deep layer aggregation structures iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters. Experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes. [paper](https://arxiv.org/abs/1707.06484)


## Accuracy, FLOPS and Parameters
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters

| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:-----------------:|:----------:|:---------:|:---------:|:---------:|
Expand Down
30 changes: 19 additions & 11 deletions docs/en/models/DPN_DenseNet_en.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
# DPN and DenseNet series
---
## Catalogue

## Overview
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on V100 GPU](#3)
* [4. Inference speed based on T4 GPU](#4)

DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge. The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPS and parameters.
<a name='1'></a>
## 1. Overview

The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
DenseNet is a new network structure proposed in 2017 and was the best paper of CVPR. The network has designed a new cross-layer connected block called dense-block. Compared to the bottleneck in ResNet, dense-block has designed a more aggressive dense connection module, that is, connecting all the layers to each other, and each layer will accept all the layers in front of it as its additional input. DenseNet stacks all dense-blocks into a densely connected network. The dense connection makes DenseNet easier to backpropagate, making the network easier to train and converge. The full name of DPN is Dual Path Networks, which is a network composed of DenseNet and ResNeXt, which proves that DenseNet can extract new features from the previous level, and ResNeXt essentially reuses the extracted features . The author further analyzes and finds that ResNeXt has high reuse rate for features, but low redundancy, while DenseNet can create new features, but with high redundancy. Combining the advantages of the two structures, the author designed the DPN network. In the end, the DPN network achieved better results than ResNeXt and DenseNet under the same FLOPs and parameters.

The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.

![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png)

Expand All @@ -14,14 +22,14 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models

![](../../images/models/T4_benchmark/t4.fp16.bs4.DPN.png)

The pretrained models of these two types of models (a total of 10) are open sourced in PaddleClas at present. The indicators are shown in the figure above. It is easy to observe that under the same FLOPS and parameters, DPN has higher accuracy than DenseNet. However,because DPN has more branches, its inference speed is slower than DenseNet. Since DenseNet264 has the deepest layers in all DenseNet networks, it has the largest parameters,DenseNet161 has the largest width, resulting the largest FLOPs and the highest accuracy in this series. From the perspective of inference speed, DenseNet161, which has a large FLOPs and high accuracy, has a faster speed than DenseNet264, so it has a greater advantage than DenseNet264.
The pretrained models of these two types of models (a total of 10) are open sourced in PaddleClas at present. The indicators are shown in the figure above. It is easy to observe that under the same FLOPs and parameters, DPN has higher accuracy than DenseNet. However,because DPN has more branches, its inference speed is slower than DenseNet. Since DenseNet264 has the deepest layers in all DenseNet networks, it has the largest parameters,DenseNet161 has the largest width, resulting the largest FLOPs and the highest accuracy in this series. From the perspective of inference speed, DenseNet161, which has a large FLOPs and high accuracy, has a faster speed than DenseNet264, so it has a greater advantage than DenseNet264.

For DPN series networks, the larger the model's FLOPs and parameters, the higher the model's accuracy. Among them, since the width of DPN107 is the largest, it has the largest number of parameters and FLOPs in this series of networks.

<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters

## Accuracy, FLOPS and Parameters

| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| DenseNet121 | 0.757 | 0.926 | 0.750 | | 5.690 | 7.980 |
| DenseNet161 | 0.786 | 0.941 | 0.778 | | 15.490 | 28.680 |
Expand All @@ -36,8 +44,8 @@ For DPN series networks, the larger the model's FLOPs and parameters, the higher




## Inference speed based on V100 GPU
<a name='3'></a>
## 3. Inference speed based on V100 GPU

| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-------------|-----------|-------------------|--------------------------|
Expand All @@ -53,8 +61,8 @@ For DPN series networks, the larger the model's FLOPs and parameters, the higher
| DPN131 | 224 | 256 | 28.083 |



## Inference speed based on T4 GPU
<a name='4'></a>
## 4. Inference speed based on T4 GPU

| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
Expand Down
23 changes: 23 additions & 0 deletions docs/en/models/ESNet_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# ESNet Series
---
## Catalogue

* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)

<a name='1'></a>
## 1. Overview

ESNet (Enhanced ShuffleNet) is a lightweight network developed by Baidu. This network combines the advantages of MobileNetV3, GhostNet, and PPLCNet on the basis of ShuffleNetV2 to form a faster and more accurate network on ARM devices, Because of its excellent performance, [PP-PicoDet](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/configs/picodet) launched in PaddleDetection uses this model as a backbone, with stronger object detection algorithm, the final mAP index refreshed the SOTA index of the object detection model on the ARM device in one fell swoop.

<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters

| Models | Top1 | Top5 | FLOPs<br>(M) | Params<br/>(M) |
|:--:|:--:|:--:|:--:|:--:|
| ESNet_x0_25 | 62.48 | 83.46 | 30.9 | 2.83 |
| ESNet_x0_5 | 68.82 | 88.04 | 67.3 | 3.25 |
| ESNet_x0_75 | 72.24 | 90.45 | 123.7 | 3.87 |
| ESNet_x1_0 | 73.92 | 91.40 | 197.3 | 4.64 |

Please stay tuned for information such as Inference speed.
27 changes: 18 additions & 9 deletions docs/en/models/EfficientNet_and_ResNeXt101_wsl_en.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
# EfficientNet and ResNeXt101_wsl series
---
## Catalogue

## Overview
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on V100 GPU](#3)
* [4. Inference speed based on T4 GPU](#4)

<a name='1'></a>
## 1. Overview

EfficientNet is a lightweight NAS-based network released by Google in 2019. EfficientNetB7 refreshed the classification accuracy of ImageNet-1k at that time. In this paper, the author points out that the traditional methods to improve the performance of neural networks mainly start with the width of the network, the depth of the network, and the resolution of the input picture.
However, the author found that balancing these three dimensions is essential for improving accuracy and efficiency through experiments.
Therefore, the author summarized how to balance the three dimensions at the same time through a series of experiments.
At the same time, based on this scaling method, the author built a total of 7 networks B1-B7 in the EfficientNet series on the basis of EfficientNetB0, and with the same FLOPS and parameters, the accuracy reached state-of-the-art effect.
At the same time, based on this scaling method, the author built a total of 7 networks B1-B7 in the EfficientNet series on the basis of EfficientNetB0, and with the same FLOPs and parameters, the accuracy reached state-of-the-art effect.

ResNeXt is an improved version of ResNet that proposed by Facebook in 2016. In 2019, Facebook researchers studied the accuracy limit of the series network on ImageNet through weakly-supervised-learning. In order to distinguish the previous ResNeXt network, the suffix of this series network is WSL, where WSL is the abbreviation of weakly-supervised-learning. In order to have stronger feature extraction capability, the researchers further enlarged the network width, among which the largest ResNeXt101_32x48d_wsl has 800 million parameters. It was trained under 940 million weak-labeled images, and the results were finetune trained on imagenet-1k. Finally, the acc-1 of imagenet-1k reaches 85.4%, which is also the network with the highest precision under the resolution of 224x224 on imagenet-1k so far. In Fix-ResNeXt, the author used a larger image resolution, made a special Fix strategy for the inconsistency of image data preprocessing in training and testing, and made ResNeXt101_32x48d_wsl have a higher accuracy. Since it used the Fix strategy, it was named Fix-ResNeXt101_32x48d_wsl.

The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.

![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png)

Expand All @@ -21,9 +29,10 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models

At present, there are a total of 14 pretrained models of the two types of models that PaddleClas open source. It can be seen from the above figure that the advantages of the EfficientNet series network are very obvious. The ResNeXt101_wsl series model uses more data, and the final accuracy is also higher. EfficientNet_B0_small removes SE_block based on EfficientNet_B0, which has faster inference speed.

## Accuracy, FLOPS and Parameters
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters

| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| ResNeXt101_<br>32x8d_wsl | 0.826 | 0.967 | 0.822 | 0.964 | 29.140 | 78.440 |
| ResNeXt101_<br>32x16d_wsl | 0.842 | 0.973 | 0.842 | 0.972 | 57.550 | 152.660 |
Expand All @@ -40,8 +49,8 @@ At present, there are a total of 14 pretrained models of the two types of models
| EfficientNetB7 | 0.843 | 0.969 | 0.844 | 0.971 | 72.350 | 64.920 |
| EfficientNetB0_<br>small | 0.758 | 0.926 | | | 0.720 | 4.650 |


## Inference speed based on V100 GPU
<a name='3'></a>
## 3. Inference speed based on V100 GPU

| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-------------------------------|-----------|-------------------|--------------------------|
Expand All @@ -61,8 +70,8 @@ At present, there are a total of 14 pretrained models of the two types of models
| EfficientNetB0_<br>small | 224 | 256 | 1.692 |



## Inference speed based on T4 GPU
<a name='4'></a>
## 4. Inference speed based on T4 GPU

| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
Expand Down
26 changes: 17 additions & 9 deletions docs/en/models/HRNet_en.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
# HRNet series
---
## Catalogue

## Overview
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)
* [3. Inference speed based on V100 GPU](#3)
* [4. Inference speed based on T4 GPU](#4)

<a name='1'></a>
## 1. Overview

HRNet is a brand new neural network proposed by Microsoft research Asia in 2019. Different from the previous convolutional neural network, this network can still maintain high resolution in the deep layer of the network, so the heat map of the key points predicted is more accurate, and it is also more accurate in space. In addition, the network performs particularly well in other visual tasks sensitive to resolution, such as detection and segmentation.

The FLOPS, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.
The FLOPs, parameters, and inference time on the T4 GPU of this series of models are shown in the figure below.

![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png)

Expand All @@ -16,10 +24,10 @@ The FLOPS, parameters, and inference time on the T4 GPU of this series of models

At present, there are 7 pretrained models of such models open-sourced by PaddleClas, and their indicators are shown in the figure. Among them, the reason why the accuracy of the HRNet_W48_C indicator is abnormal may be due to fluctuations in training.

<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters

## Accuracy, FLOPS and Parameters

| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPS<br>(G) | Parameters<br>(M) |
| Models | Top1 | Top5 | Reference<br>top1 | Reference<br>top5 | FLOPs<br>(G) | Parameters<br>(M) |
|:--:|:--:|:--:|:--:|:--:|:--:|:--:|
| HRNet_W18_C | 0.769 | 0.934 | 0.768 | 0.934 | 4.140 | 21.290 |
| HRNet_W18_C_ssld | 0.816 | 0.958 | 0.768 | 0.934 | 4.140 | 21.290 |
Expand All @@ -32,8 +40,8 @@ At present, there are 7 pretrained models of such models open-sourced by PaddleC
| HRNet_W64_C | 0.793 | 0.946 | 0.795 | 0.946 | 57.830 | 128.060 |
| SE_HRNet_W64_C_ssld | 0.847 | 0.973 | | | 57.830 | 128.970 |


## Inference speed based on V100 GPU
<a name='3'></a>
## 3. Inference speed based on V100 GPU

| Models | Crop Size | Resize Short Size | FP32<br>Batch Size=1<br>(ms) |
|-------------|-----------|-------------------|--------------------------|
Expand All @@ -49,8 +57,8 @@ At present, there are 7 pretrained models of such models open-sourced by PaddleC




## Inference speed based on T4 GPU
<a name='4'></a>
## 4. Inference speed based on T4 GPU

| Models | Crop Size | Resize Short Size | FP16<br>Batch Size=1<br>(ms) | FP16<br>Batch Size=4<br>(ms) | FP16<br>Batch Size=8<br>(ms) | FP32<br>Batch Size=1<br>(ms) | FP32<br>Batch Size=4<br>(ms) | FP32<br>Batch Size=8<br>(ms) |
|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
Expand Down
11 changes: 9 additions & 2 deletions docs/en/models/HarDNet.md → docs/en/models/HarDNet_en.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,17 @@
# HarDNet series
---
## Catalogue

## Overview
* [1. Overview](#1)
* [2. Accuracy, FLOPs and Parameters](#2)

<a name='1'></a>
## 1. Overview

HarDNet(Harmonic DenseNet)is a brand new neural network proposed by National Tsing Hua University in 2019, which to achieve high efficiency in terms of both low MACs and memory traffic. The new network achieves 35%, 36%, 30%, 32%, and 45% inference time reduction compared with FC-DenseNet-103, DenseNet-264, ResNet-50, ResNet-152, and SSD-VGG, respectively. We use tools including Nvidia profiler and ARM Scale-Sim to measure the memory traffic and verify that the inference latency is indeed proportional to the memory traffic consumption and the proposed network consumes low memory traffic. [Paper](https://arxiv.org/abs/1909.00948).

## Accuracy, FLOPS and Parameters
<a name='2'></a>
## 2. Accuracy, FLOPs and Parameters

| Model | Params (M) | FLOPs (G) | Top-1 (%) | Top-5 (%) |
|:---------------------:|:----------:|:---------:|:---------:|:---------:|
Expand Down
Loading

0 comments on commit 2153033

Please sign in to comment.