Skip to content

Latest commit

 

History

History
58 lines (47 loc) · 3.02 KB

MODELS.md

File metadata and controls

58 lines (47 loc) · 3.02 KB

Scene Parsing

Accurate Models

Method Backbone ADE20K
(mIoU)
Cityscapes
(mIoU)
COCO-Stuff
(mIoU)
Params
(M)
GFLOPs
(512x512)
GFLOPs
(1024x1024)
Weights
SegFormer MiT-B1 42.2 78.5 40.2 14 16 244 ade
MiT-B2 46.5 81.0 44.6 28 62 717 ade
MiT-B3 49.4 81.7 45.5 47 79 963 ade
Light-Ham VAN-S 45.7 - - 15 21 - -
VAN-B 49.6 - - 27 34 - -
VAN-L 51.0 - - 46 55 - -
Lawin MiT-B1 42.1 79.0 40.5 14 13 218 -
MiT-B2 47.8 81.7 45.2 30 45 563 -
MiT-B3 50.3 82.5 46.6 50 62 809 -
TopFormer TopFormer-T 34.6 - - 1.4 0.6 - -
TopFormer-S 37.0 - - 3.1 1.2 - -
TopFormer-B 39.2 - - 5.1 1.8 - -
  • mIoU results are with a single scale from official papers.
  • ADE20K image size = 512x512
  • Cityscapes image size = 1024x1024
  • COCO-Stuff image size = 512x512

Real-time Models

Method Backbone CityScapes-val
(mIoU)
CamVid
(mIoU)
Params (M) GFLOPs
(1024x2048)
Weights
BiSeNetv1 ResNet-18 74.8 68.7 14 49 -
BiSeNetv2 - 73.4 72.4 18 21 -
SFNet ResNetD-18 79.0 - 13 - -
DDRNet DDRNet-23slim 77.8 74.7 6 36 city
  • mIoU results are with a single scale from official papers.
  • Cityscapes image size = 1024x2048 (except BiSeNetv1 & 2 which uses 512x1024)
  • CamVid image size = 960x720

Face Parsing

Method Backbone HELEN-val
(mIoU)
Params
(M)
GFLOPs
(512x512)
FPS
(GTX1660ti)
Weights
BiSeNetv1 ResNet-18 58.50 14 13 263 HELEN
BiSeNetv2 - 58.58 18 15 195 HELEN
DDRNet DDRNet-23slim 61.11 6 5 180 HELEN
SFNet ResNetD-18 61.00 14 31 56 HELEN