Pixel-Level Semantic Segmentation Task

The segmentation benchmark involves pixel level predictions for all the 26 classes at level 3 of the label hierarchy (see Overview, for details of the level 3 ids).

Output Format

The output format is a png image with the same resolution as the input image, where the value of every pixel is an integer in {1. .... , 27}, where the first 26 classes corresponds to the level3Ids (see Overview, for details of the level 3 ids) and the class 27 is used as a miscellaneous class.


We will be using the mean Intersection over Union metric. All the ground truth and predictions maps will be resized to 1080p (using nearest neighbor) and True positives (TP), False Negatives (FN) and False positives (FP) will be computed for each class (except 27) over the entire test split of the dataset. Intersection over Union (IoU) will be computed for each class by the formula TP/(TP+FN+FP) and the mean value is taken as the metric (commonly known as mIoU) for the segmentation challenge.

Additionally we will also be reporting the mIoU for level 2 and level 1 ids also at 1080p resolution in the leader board.

Team/Uploader Name Method Name mIoU for L3 IDs at 1080p mIoU for L2 IDs at 1080p mIoU for L1 IDs at 1080p
Baseline*DRN-D-38 [3]0.6656--
Baseline*ERFNet [2]0.5541--
Mapillary Research (AutoNUE Challenge) Inplace ABN 0.7432 0.7789 0.8972
BDAI (AutoNUE Challenge) PSPNET+++ 0.7412 0.7796 0.8992
Vinda (AutoNUE Challenge) Joint Channel-Spatial Attention... 0.7407 0.78 0.8986
Geelpen (AutoNUE Challenge) Places365 model feature trained 0.7376 0.7788 0.8954
HUST_IALab (AutoNUE Challenge) DenseScaleNetwork 0.7339 0.7745 0.8955
Appari Lalith PSPNet IDD12_2 0.7215 0.7662 0.8926
DeepScene (AutoNUE Challenge) Easpp+DenseAspp 0.7111 0.7584 0.8823
Team7 (AutoNUE Challenge) DRN-D-105 modified 0.6794 0.738 0.8696
MingdongYang_WHUT DSMRSeg 0.6444 0.7081 0.8428
TeamTiger Modified DLV3+ 0.5892 0.6826 0.8346
Shubham Innani pdlv3 0.5791 0.6627 0.8077
Anonymous ana 0.5642 0.6731 0.8012
Anonymous Anonymous 0.4784 0.5704 0.6707
Attention-Net Attention U-net Based Segmentation 0.3422 0.4716 0.6281
Sabari nathan Attention U-net 0.3305 0.4419 0.6423
Sabari nathan Attention U-net Based Segmentation 0.0187 0.1345 0.3423

* Baseline was run by the organizers using the code released by the authors (ERFNet [2] here: and (DRN [3] here:

Instance-Level Semantic Segmentation Task

In the instance segmentation benchmark, the model is expected to segment each instance of a class separately. Instance segments are only expected of "things" classes which are all level3Ids under living things and vehicles (ie. level3Ids 4-12).

Output Format & Metric

The output format and metric is the same as Cityscape's instance segmentation [1].

The predictions should use "id" specified in : , unlike the semantic segmentation challenge where level3Ids were used.

Team/Uploader Name Method Name AP AP 50%
TUTU (AutoNUE Challenge) PANET 0.3918 0.6753
Anonymous Anonymous 0.2766 0.5001
Poly (AutoNUE Challenge) RESNET101 MASK RCNN 0.2681 0.4991
Dynamove_IITM (AutoNUE Challenge) Mask RCNN 0.1857 0.3873
DV (AutoNUE Challenge) Finetuned MaskRCNN 0.1036 0.1998
Anonymous Anonymous 0.0 0.0


  1. The Cityscapes Dataset for Semantic Urban Scene Understanding.
    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele.
    The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213-3223
  2. ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation.
    E. Romera, J. M. Alvarez, L. M. Bergasa & R. Arroyo,
    Transactions on Intelligent Transportation Systems (T-ITS), December 2017. [pdf]
  3. Dilated Residual Networks.
    Fisher Yu, Vladlen Koltun & Thomas Funkhouser
    Computer Vision and Pattern Recognition (CVPR) 2017. [code]