List of Sections ↓

This chapter explains how to use classification based on deep learning, both for the training and inference phases.

Classification based on deep learning is a method, in which an image gets a set of confidence values assigned. These confidence values indicate how likely the image belongs to each of the distinguished classes. Thus, if we regard only the top prediction, classification means to assign a specific class out of a given set of classes to an image. This is illustrated with the following schema.

image/svg+xml orange: 0.03 apple: 0.85 lemon: 0.12
A possible classification example, in which the network distinguishes three classes. The input image gets confidence values assigned for each of the three distinguished classes: 'apple' 0.85, 'lemon' 0.03, and 'orange' 0.12. The top prediction tells us, the image is recognized as 'apple'.

In order to do your specific task, thus to classify your data into the classes you want to have distinguished, the classifier has to be trained accordingly. In HALCON, we use a technique called transfer learning (see also the chapter Deep Learning). Hence, we provide pretrained networks, representing classifiers which have been trained on huge amounts of labeled image data. These classifiers have been trained and tested to perform well on industrial image classification tasks. One of these classifiers, already trained for general classifications, is now retrained for your specific task. For this, the classifier needs to know, which classes are to be distinguished and how such examples look like. This is represented by your dataset, i.e., your images with the corresponding ground truth labels. More information on the data requirements can be found in the section “Data”.

In HALCON, classification with deep learning is implemented within the more general deep learning model. For more information to the latter one, see the chapter Deep Learning / Model. For the specific system requirements in order to apply deep learning, please refer to the HALCON “Installation Guide”.

The following sections are introductions to the general workflow needed for classification, information related to the involved data and parameters, and explanations to the evaluation measures.

General Workflow

In this paragraph, we describe the general workflow for a classification task based on deep learning. It is subdivided into the four parts preprocessing of the data, training of the model, evaluation of the trained model, and inference on new images. Thereby we assume, your dataset is already labeled, see also the section “Data” below. Have a look at the HDevelop example series classify_pill_defects_deep_learning for an application.

Preprocess the data

This part is about how to preprocess your data. The single steps are also shown in the HDevelop example classify_pill_defects_deep_learning_1_preprocess.hdev.

  1. The information what is to be found in which image of your training dataset needs to be transferred. This is done by the procedure

    • read_dl_dataset_classification.

    Thereby a dictionary DLDataset is created, which serves as a database and stores all necessary information about your data. For more information about the data and the way it is transferred, see the section “Data” below and the chapter Deep Learning / Model.

  2. Split the dataset represented by the dictionary DLDataset. This can be done using the procedure

    • split_dl_dataset.

    The resulting split will be saved over the key split in each sample entry of DLDataset.

  3. Read in a pretrained network using the operator

    This operator is likewise used when you want to read your own trained networks, after you saved them with write_dl_modelwrite_dl_modelWriteDlModelWriteDlModelWriteDlModelwrite_dl_model.

    The network will impose several requirements on the images, as the image dimensions and the gray value range. The default values are listed in read_dl_modelread_dl_modelReadDlModelReadDlModelReadDlModelread_dl_model. These are the values with which the networks have been pretrained. The network architectures allow different image dimensions, which can be set with set_dl_model_paramset_dl_model_paramSetDlModelParamSetDlModelParamSetDlModelParamset_dl_model_param, but depending on the network a change may make a retraining necessary. The actually set values can be retrieved with

  4. Now you can preprocess your dataset. For this, you can use the procedure

    • preprocess_dl_dataset.

    In case of custom preprocessing, this procedure offers guidance on the implementation.

    To use this procedure, specify the preprocessing parameters as e.g., the image size. Store all the parameters with their values in a dictionary DLPreprocessParam, wherefore you can use the procedure

    • create_dl_preprocess_param.

    We recommend to save this dictionary DLPreprocessParam in order to have access to the preprocessing parameter values later during the inference phase.

Training of the model

This part is about how to train a classifier. The single steps are also shown in the HDevelop example classify_pill_defects_deep_learning_2_train.hdev.

  1. Set the training parameters and store them in the dictionary TrainParam. These parameters include:

    • the hyperparameters, for an overview see the chapter Deep Learning.

    • parameters for possible data augmentation (optional).

    • parameters for the evaluation during training.

    • parameters for the visualization of training results.

    • parameters for serialization.

    This can be done using the procedure

    • create_dl_train_param.

  2. Train the model. This can be done using the procedure

    • train_dl_model.

    The procedure expects:

    • the model handle DLModelHandleDLModelHandleDLModelHandleDLModelHandleDLModelHandledlmodel_handle

    • the dictionary with the data information DLDataset

    • the dictionary with the training parameter 'TrainParam'"TrainParam""TrainParam""TrainParam""TrainParam""TrainParam"

    • the information, over how many epochs the training shall run.

    In case the procedure train_dl_model is used, the total loss as well as optional evaluation measures are visualized.

Evaluation of the trained model

In this part we evaluate the trained classifier. The single steps are also shown in the HDevelop example classify_pill_defects_deep_learning_3_evaluate.hdev.

  1. The evaluation can conveniently be done using the procedure

    • evaluate_dl_model.

  2. The dictionary EvaluationResults holds the asked evaluation measures. You can visualize your evaluation results using the procedure

    • dev_display_classification_evaluation.

  3. A heatmap can be generated for specified samples using

    1. the operator gen_dl_model_heatmapgen_dl_model_heatmapGenDlModelHeatmapGenDlModelHeatmapGenDlModelHeatmapgen_dl_model_heatmap

    2. the procedure gen_dl_model_classification_heatmap

Inference on new images

This part covers the application of a deep-learning-based classification model. The single steps are also shown in the HDevelop example classify_pill_defects_deep_learning_4_infer.hdev.

  1. Set the parameters as e.g., 'batch_size'"batch_size""batch_size""batch_size""batch_size""batch_size" using the operator

  2. Generate a data dictionary DLSample for each image. This can be done using the procedure

    • gen_dl_samples_from_images.

  3. Preprocess the images as done for the training. We recommend to do this using the procedure

    • preprocess_dl_samples.

    When you saved the dictionary DLPreprocessParam during the preprocessing step, you can directly use it as input to specify all parameter values.

  4. Apply the model using the operator

  5. Retrieve the results from the dictionary 'DLResultBatch'"DLResultBatch""DLResultBatch""DLResultBatch""DLResultBatch""DLResultBatch".


We distinguish between data used for training and data for inference. Latter one consists of bare images. But for the former one you already know to which class the images belong and provide this information over the corresponding labels.

As a basic concept, the model handles data over dictionaries, meaning it receives the input data over a dictionary DLSample and returns a dictionary DLResult and DLTrainResult, respectively. More information on the data handling can be found in the chapter Deep Learning / Model.

Data for training and evaluation

The dataset consists of images and corresponding information. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the section “Images” below.

The training data is used to train and evaluate a network for your specific task. With the aid of this data the classifier can learn which classes are to be distinguished and how their representatives look like. In classification, the image is classified as a whole. Therefore, the training data consists of images and their ground truth labels, thus the class you say this image belongs to. Note that the images should be as representative as possible for your task. There are different ways possible, how to store and retrieve this information. How the data has to be formatted in HALCON for a DL model is explained in the chapter Deep Learning / Model. In short, a dictionary DLDataset serves as a database for the information needed by the training and evaluation procedures. The procedure read_dl_dataset_classification supports the following sources of the ground truth label for an image:

For training a classifier, we use a technique called transfer learning (see the chapter Deep Learning). For this, you need less resources, but still a suitable set of data. While in general the network should be more reliable when trained on a larger dataset, the amount of data needed for training also depends on the complexity of the task. You also want enough training data to split it into three subsets, used for training, validation, and testing the network. These subsets are preferably independent and identically distributed, see the section “Data” in the chapter Deep Learning.


Regardless of the application, the network poses requirements on the images regarding e.g., the image dimensions. The specific values depend on the network itself and can be queried with get_dl_model_paramget_dl_model_paramGetDlModelParamGetDlModelParamGetDlModelParamget_dl_model_param. In order to fulfill these requirements, you may have to preprocess your images. Standard preprocessing is implemented in preprocess_dl_dataset and in preprocess_dl_samples for a single sample, respectively. In case of custom preprocessing these procedures offer guidance on the implementation.

Network output

As training output, the operator will return a dictionary DLTrainResultDLTrainResultDLTrainResultDLTrainResultDLTrainResultdltrain_result with the current value of the total loss as well as values for all other losses included in your model.

As inference and evaluation output, the network will return a dictionary DLResultDLResultDLResultDLResultDLResultdlresult for every sample. For classification, this dictionary will include for each input image a tuple with the confidence values for every class to be distinguished in decreasing order and a second tuple with the corresponding class IDs.

Interpreting the Classification Results

When we classify an image, we obtain a set of confidence values, telling us the affinity of the image to every class. It is also possible to compute the following values.

Confusion Matrix, Precision, Recall, and F-score

In classification whole images are classified. As a consequence, the instances of a confusion matrix are images. See the chapter Deep Learning for explanations on confusion matrices.

You can generate a confusion matrix with the aid of the procedures gen_confusion_matrix and gen_interactive_confusion_matrix. Thereby, the interactive procedure gives you the possibility to select examples of a specific category, but it does not work with exported code.

From such a confusion matrix you can derive various values. The precision is the proportion of all correct predicted positives to all predicted positives (true and false ones). Thus, it is a measure of how many positive predictions really belong to the selected class.

The recall, also called the "true positive rate", is the proportion of all correct predicted positives to all real positives. Thus, it is a measure of how many samples belonging to the selected class were predicted correctly as positives.

A classifier with high recall but low precision finds most members of positives (thus members of the class), but at the cost of also classifying many negatives as member of the class. A classifier with high precision but low recall is just the opposite, classifying only few samples as positives, but most of these predictions are correct. An ideal classifier with high precision and high recall will classify many samples as positive with a high accuracy.

To represent this with a single number, we compute the F1-score, the harmonic mean of precision and recall. Thus, it is a measure of the classifier's accuracy.

For the example from the confusion matrix shown in Deep Learning we get for the class 'apple' the values precision: 1.00 (= 68/(68+0+0)), recall: 0.74 (= 68/(68+21+3)), and F1-score: 0.85 (=2*(1.00*0.74)/(1.00+0.74)).

List of Sections