Semantic Segmentation

List of Sections ↓

This chapter explains how to use semantic segmentation based on deep learning, both for the training and inference phases.

With semantic segmentation we assign each pixel of the input image to a class using a deep learning (DL) network.

image/svg+xml apple lemon orange background
A possible example for semantic segmentation: Every pixel of the input image is assigned to a class, but neither the three different instances of the class 'apple' nor the two different instances of the class 'orange' are distinguished objects.

The result of semantic segmentation is an output image, in which the pixel value signifies the assigned class of the corresponding pixel in the input image. Thus, in HALCON the output image is of the same size as the input image. For general DL networks the deeper feature maps, representing more complex features, are usually smaller than the input image (see the section “The Network and the Training Process” in Deep Learning). To obtain an output of the same size as the input, HALCON uses segmentation networks with two components: an encoder and a decoder. The encoder determines features of the input image as done, e.g., for deep-learning-based classification. As this information is 'encoded' in a compressed format, the decoder is needed to reconstruct the information to the desired outcome, which, in this case, is the assignment of each pixel to a class. Note that, as pixels are classified, overlapping instances of the same class are not distinguished as distinct.

Edge extraction is a special case of semantic segmentation, where the model is trained to distinguish two classes: 'edge' and 'background'. For more information, see “Solution Guide I - Basics”.