MVTec Software GmbH

MVTec Densely Segmented Supermarket Dataset (MVTec D2S)


The Densely Segmented Supermarket (D2S) dataset is a benchmark for instance-aware semantic segmentation in an industrial domain. It contains 21,000 high-resolution images with pixel-wise labels of all object instances. The objects comprise groceries and everyday products from 60 categories. The benchmark is designed such that it resembles the real-world setting of an automatic checkout, inventory, or warehouse system. The training images only contain objects of a single class on a homogeneous background, while the validation and test sets are much more complex and diverse. To further benchmark the robustness of instance segmentation methods, the scenes are acquired with different lightings, rotations, and backgrounds.

We ensure that there are no ambiguities in the labels and that every instance is labeled comprehensively. The annotations are pixel-precise and allow using crops of single instances for artificial data augmentation. The dataset covers several challenges highly relevant in the field, such as a limited amount of training data and a high diversity in the test and validation sets.

More info can be found in the corresponding paper and the video below.


For ease-of-use, the data is provided in the same format as the well-known COCO dataset (
Please note that our dataset is hosted on an FTP server. You need a browser that supports the File Transfer Protocol (e.g. Mozilla Firefox) or an FTP client software to download them.


  • Images: The ‘images’-folder contains all images including the artificially augmented ones as described in the paper.

  • Annotations: Contains the annotations for different training and validation splits

D2S Amodal:


If you use the D2S dataset in scientific work, please cite

Patrick Follmann, Tobias Böttger, Philipp Härtinger, Rebecca König, Markus Ulrich: MVTec D2S: Densely Segmented Supermarket Dataset; in: European Conference on Computer Vision (ECCV), 569-585, 2018.



If you use the D2S amodal dataset in scientific work, please cite

Patrick Follmann, Rebecca König, Philipp Härtinger, and Michael Klostermann. Learning to See the Invisible: End-to-End Trainable Amodal Instance Segmentation; in: 2019 IEEE Winter Conference on Applications in Computer Vision (WACV 2019), January 2019.


The data is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). For using the data in a way that falls under the commercial use clause of the license, please contact us.

Test Set Evaluation

The D2S test annotations are not public. Please run the inference of your model on the test images and store the results in a json-file.

If you provide us with the result-json-file, we can evaluate your results for you on the test-set-annotations. For this purpose, please get in touch with us via the contact form below.

The results should be in the typical COCO-json-format, i.e. the json-file should contain a list of results (with image_id, category_id, segmentation (mask given as RLE) or bbox and score) using the structure as in the following example:

  • For instance segmentation masks (region given as RLE):
    [{u'image_id': 42,
       u'category_id': 18,
       u'segmentation': {u'counts': , u'size': [1440, 1920]},
       u'score': 0.959136}, ...]
  • For bounding box detection (bbox given as [x, y, w, h]):
    [{u'image_id': 42,
       u'category_id': 18,
       u'bbox': [258.15,41.29,348.26,243.78],
       u'score': 0.959136}, ...]


If you have any questions or comments about the dataset, feel free to contact us via this form.

Contact Form

D2S Contact Form
Privacy Policy*
* You must fill out this field in order to send the form.