MVTec densely segmented supermarket dataset (MVTec D2S)

D2S dataset


The densely segmented supermarket (D2S) dataset is a benchmark for instance-aware semantic segmentation in an industrial domain. It contains 21,000 high-resolution images with pixel-wise labels of all object instances. The objects comprise groceries and everyday products from 60 categories. The benchmark is designed such that it resembles the real-world setting of an automatic checkout, inventory, or warehouse system. The training images only contain objects of a single class on a homogeneous background, while the validation and test sets are much more complex and diverse. To further benchmark the robustness of instance segmentation methods, the scenes are acquired with different lightings, rotations, and backgrounds.

We ensure that there are no ambiguities in the labels and that every instance is labeled comprehensively. The annotations are pixel-precise and allow using crops of single instances for artificial data augmentation. The dataset covers several challenges highly relevant in the field, such as a limited amount of training data and a high diversity in the test and validation sets.

More info can be found in the corresponding paper and the video below.

Thumbnail for MVTec D2S dataset video

Please note: Once you watch the video, data will be transmitted to Youtube/Google. For more information, see Google Privacy.

Dataset Download:

Test Set Evaluation

The D2S test annotations are not public. Please run the inference of your model on the test images and store the results in a json-file.

If you provide us with the result-json-file, we can evaluate your results for you on the test-set-annotations. For this purpose, please get in touch with us via the contact form below.

The results should be in the typical COCO-json-format, i.e. the json-file should contain a list of results (with image_id, category_id, segmentation (mask given as RLE) or bbox and score) using the structure as in the following example:

  • For instance segmentation masks (region given as RLE):
    [{u'image_id': 42,
       u'category_id': 18,
       u'segmentation': {u'counts': , u'size': [1440, 1920]},
       u'score': 0.959136}, ...]
  • For bounding box detection (bbox given as [x, y, w, h]):
    [{u'image_id': 42,
       u'category_id': 18,
       u'bbox': [258.15,41.29,348.26,243.78],
       u'score': 0.959136}, ...]

Please note: License Terms

The data is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

In particular, it is not allowed to use the dataset for commercial purposes. If you are unsure whether or not your application violates the non-commercial use clause of the license, please contact us.

If you have any questions or comments about the dataset, feel free to contact us via email.

Write us now