ICVGIP 2020 Benchmark

ICVGIP '20 - Segmentation

The segmentation challenge involves pixel level predictions for all the 26 classes at level 3 of the label hierarchy (see Overview, for details of the level 3 ids).


IDD20K is an expanded version of the IDD dataset released last year with fine annotations for 20K images.

Directions for Participation

  1. Register an account at http://idd.insaan.iiit.ac.in/, with event selected as "ICVGIP IDD Data Challenge 2020".
  2. Go to Dataset > Download page in the menu.
  3. Dataset consists of 2 parts which are available for download.
  4. The dataset to be used for this event is IDD 20k Part II.
  5. Extract both the downloaded compressed files in to the same folder.
  6. Please run the data preparation code for generating ground truth segmentation mask as documented here: https://github.com/icvgip2020/public-code . Use the following command for segmentation mask generation:
    python preperation/createLabels.py --datadir $IDD --id-type level3Id --num-workers $C
  7. Once you have built a model, and have the predictions of the model in any of the split (train, val), you can evaluate the metric as directed here: https://github.com/icvgip2020/public-code#evaluation. Use the following command for segmentation evaluation:
    python evaluate/evaluate_mIoU.py --gts $GT --preds $PRED --num-workers $C
    Your predictions is a png image, which has the size of 1280x720. Each pixel of this image contains the label as level3Ids (see labels code) of the corresponding image (resized to 1280x720). The evaluation code above resizes both your prediction and ground truth png files to 1280x720, in case they are not of that size.
  8. Finally you can upload the predictions for the test split (4k; 2k each from the two parts of IDD20K), to be evaluated for the leaderboard here: http://idd.insaan.iiit.ac.in/evaluation/submission/submit/

Output Format

The output format is a png image with the same resolution as the input image, where the value of every pixel is an integer in {0. .... , 26}, where the first 0-25 classes corresponds to the level3Ids (see Overview, for details of the level 3 ids) and the class 26 is used as a miscellaneous class.


We will be using the mean Intersection over Union metric. All the ground truth and predictions maps will be resized to 720p (using nearest neighbor) and True positives (TP), False Negatives (FN) and False positives (FP) will be computed for each class (except 26) over the entire test split of the dataset. Intersection over Union (IoU) will be computed for each class by the formula TP/(TP+FN+FP) and the mean value is taken as the metric (commonly known as mIoU) for the segmentation challenge.

Additionally we will also be reporting the mIoU for level 2 and level 1 ids also at 720p resolution in the leader board. Evaluation scripts are available here: https://github.com/AutoNUE/public-code

Team/Uploader Name Method Name mIoU for L3 IDs at 720p mIoU for L2 IDs at 720p mIoU for L1 IDs at 720p
BetaSeg Adapted HRNet 0.6747 0.7321 0.8672
TeamTiger Aligned DLV3+ 0.6353 0.6986 0.8514
TeamTiger Modified aligned DLV3+ 0.6353 0.6986 0.8514
TeamTiger Modified DLV3+ 0.6265 0.6915 0.8427
TeamTiger Edited DLV3+ 0.619 0.6907 0.8302
BetaSeg Adapted HRNet 0.6038 0.6821 0.8306
BetaSeg Enhanced HRNet 0.5699 0.672 0.8255
FutureSight Short HRNet 0.4195 0.5219 0.6498
FutureSight Short HRNet + 0.3724 0.4734 0.5909
One_Two_Ka_Four Modified HRNet 0.3615 0.4829 0.6506
BetaSeg HRNet 0.3455 0.3212 0.7537
One_Two_Ka_Four Modified HRNet 0.1972 0.2886 0.3439
HRNet Extended HRNet 0.1793 0.2432 0.396
FutureSight Transferred_HRNet 0.1793 0.2432 0.396
OTKF AttN 0.0906 0.1437 0.134
vision_aries DLV 0.0158 0.0251 0.0988