List of Operators ↓

This chapter explains how to use classification based on deep learning, both for the training and inference phases.

Classification based on deep learning is a method, in which an image gets a set of confidence values assigned. These confidence values indicate how likely the image belongs to each of the distinguished classes. Thus, if we regard only the top prediction, classification means to assign a specific class out of a given set of classes to an image. This is illustrated with the following schema.

image/svg+xml orange: 0.03 apple: 0.85 lemon: 0.12
A possible classification example, in which the network distinguishes three classes. The input image gets confidence values assigned for each of the three distinguished classes: 'apple' 0.85, 'lemon' 0.03, and 'orange' 0.12. The top prediction tells us, the image is recognized as 'apple'.

In order to do your specific task, thus to classify your data into the classes you want to have distinguished, the classifier has to be trained accordingly. In HALCON, we use a technique called transfer learning (see also the chapter Deep Learning). Hence, we provide pretrained networks, representing classifiers which have been trained on huge amounts of labeled image data. These classifiers have been trained and tested to perform well on industrial image classification tasks. One of these classifiers, already trained for general classifications, is now retrained for your specific task. For this, the classifier needs to know, which classes are to be distinguished and how such examples look like. This is represented by your dataset, i.e., your images with the corresponding ground truth labels. More information on the data requirements can be found in the section “Data for classification”.

For the specific system requirements in order to apply deep learning classification, please refer to the HALCON “Installation Guide”.

Operator Workflow

Have a look at the HDevelop example classify_fruit_deep_learning.hdev for a short-and-simple overview and classify_pill_defects_deep_learning.hdev for a more sophisticated workflow (both can be found under examples/hdevelop/Deep-Learning/Classification/). They provide great guidance on how the different parts can be used together.

Prepare the Network and the Data
  1. First, a pretrained network has to be read using the operator

    This operator is used as well when you want to read your own trained networks, after you saved them with write_dl_classifierwrite_dl_classifierWriteDlClassifierWriteDlClassifierWriteDlClassifier.

  2. To read the data for your deep learning classification training the procedure

    • read_dl_classifier_data_set

    is available. Using this procedure you can get a list of image file paths and their respective labels (the ground truth labels) as well as a list of the unique classes, to which at least one of the listed images belongs.

  3. The network will impose several requirements on the images, as the image dimensions and the gray value range. The default values are listed in read_dl_classifierread_dl_classifierReadDlClassifierReadDlClassifierReadDlClassifier. These are the values with which the networks have been pretrained. The network architectures allow different image dimensions, which can be set with set_dl_classifier_paramset_dl_classifier_paramSetDlClassifierParamSetDlClassifierParamSetDlClassifierParam, but depending on the network a change may make a retraining necessary. The actually set values can be retrieved with

    The procedure preprocess_dl_classifier_images provides great guidance on how to implement such a preprocessing stage. We recommend to preprocess and store all images used for the training before starting the classifier training, since this speeds up the training significantly.

  4. Next, we recommend to split the dataset into three distinct datasets which are used for training, validation, and testing, see the section “Data” in the chapter Deep Learning. This can be achieved using the procedure

    • split_dl_classifier_data_set.

  5. You need to specify the 'classes'"classes""classes""classes""classes" (determined before by use of read_dl_classifier_data_set) you want to differentiate with your classifier. For this, the operator

    is available.

    This operator can also be used to set hyperparameters, which are important for training, e.g. 'batch_size'"batch_size""batch_size""batch_size""batch_size", and 'learning_rate'"learning_rate""learning_rate""learning_rate""learning_rate". For a more detailed explanation, see the chapter Deep Learning and the documentation of set_dl_classifier_paramset_dl_classifier_paramSetDlClassifierParamSetDlClassifierParamSetDlClassifierParam.

Train the Network and Evaluate the Training Progress

Once your network is set up and your data prepared it is time to train the classifier for your specific task.

  1. Set the hyperparameters used for training with the operator

    For an overview of possible hyperparameters, see the documentation of set_dl_classifier_paramset_dl_classifier_paramSetDlClassifierParamSetDlClassifierParamSetDlClassifierParam. Additional explanations can be found in the chapter Deep Learning.

  2. To train the classifier the operator

    is available. The intermediate training results are stored in the output handle.

    As the name of train_dl_classifier_batchtrain_dl_classifier_batchTrainDlClassifierBatchTrainDlClassifierBatchTrainDlClassifierBatch indicates, this operator processes a batch of data (images and ground truth labels) at once. We iterate through our training data in order to train the classifier successively with train_dl_classifier_batchtrain_dl_classifier_batchTrainDlClassifierBatchTrainDlClassifierBatchTrainDlClassifierBatch. You can repeat this process multiple times and iterate over so many training epochs until you are satisfied with the training result.

  3. To know how well the classifier learns the new task, the procedure

    • plot_dl_classifier_training_progress

    is provided. With it you can plot the classification errors during training. To compute the input necessary for the visualization, the procedures

    • select_percentage_dl_classifier_data,

    • apply_dl_classifier_batchwise, and

    • evaluate_dl_classifier

    are available. With them you can reduce the number of images used for this classification validation, apply the classifier on the selected data, and compute, for example, the top-1 error. Have a look at the HDevelop example classify_pill_defects_deep_learning.hdev to see how these procedures can be used together.

Apply and Evaluate the Final Classifier

Your classifier is trained for your task and ready to be applied. But before deploying in the real world you should evaluate how well the classifier performs on basis of your test data.

  1. To apply the classifier on a set containing an arbitrary number of images, use the operator

    The runtime of this operator depends on the number of batches needed for the given image set.

    The results are returned in a handle.

    To retrieve the predicted classes and confidences, use the operator

  2. Now it is time to evaluate these results. The performance of the classifier can be evaluated as during the training with evaluate_dl_classifier.

    To visualize and analyze the classifier quality, the confusion matrix is a useful tool (see below for an explanation). For this, you can use the procedures

    • gen_confusion_matrix

    • gen_interactive_confusion_matrix.

    The interactive procedure gives you the possibility to select examples of a specific category, but it does not work with exported code.

    Additionally, after applying the classifier on a set of data, you can use the procedure

    • get_dl_classifier_image_results

    to display and return images according to certain criteria, e.g., wrongly classified ones. Then, you might want to use this input for the procedure

    • dev_display_dl_classifier_heatmap,

    to display a heatmap of the input image, with which you can analyze which regions of the image are relevant for the classification result.

Inference Phase

When your classifier is trained and you are satisfied with its performance, you can use it to classify new images. For this, you simply preprocess your images according to the network requirements (i.e., the same way as you did for your dataset used for training the classifier) and apply the classifier using

Data for classification

We distinguish between data used for training and data for inference. Latter one consists of bare images. But for the former one you already know to which class the images belong and provide this information over the corresponding labels.

The training data is used to train a classifier for your specific task. With the aid of this data the classifier can learn which classes are to be distinguished and how their representatives look like. In classification, the image is classified as a whole. Therefore, the training data consists of images and their ground truth labels, thus the class you say this image belongs to. Note that the images should be as representative as possible for your task. There are different possible ways, how to store and retrieve the ground truth labels. The procedure read_dl_classifier_data_set supports the following sources of the ground truth label for an image:

For training a classifier, we use a technique called transfer learning (see the chapter Deep Learning). For this, you need less resources, but still a suitable set of data which is generally in the order of hundreds to thousands per class. While in general the network should be more reliable when trained on a larger dataset, the amount of data needed for training also depends on the complexity of the task. You also want enough training data to split it into three subsets, which are preferably independent and identically distributed, see the section “Data” in the chapter Deep Learning.

Regardless of the application, the network poses requirements on the images regarding the image dimensions, the gray value range, and the type. The specific values depend on the network itself and can be queried with get_dl_classifier_paramget_dl_classifier_paramGetDlClassifierParamGetDlClassifierParamGetDlClassifierParam. You can find guidance on how to implement such a preprocessing stage by the procedure preprocess_dl_classifier_images.

Interpreting the Classification Results

When we classify an image, we obtain a set of confidence values, telling us the affinity of the image to every class. It is also possible to compute the following values.

Confusion Matrix, Precision, Recall, and F-score

In classification whole images are classified. As a consequence, the instances of a confusion matrix are images. See the chapter Deep Learning for explanations on confusion matrices.

You can generate a confusion matrix with the aid of the procedures gen_confusion_matrix and gen_interactive_confusion_matrix. Thereby, the interactive procedure gives you the possibility to select examples of a specific category, but it does not work with exported code.

From such a confusion matrix you can derive various values. The precision is the proportion of all correct predicted positives to all predicted positives (true and false ones). Thus, it is a measure of how many positive predictions really belong to the selected class.

The recall, also called the "true positive rate", is the proportion of all correct predicted positives to all real positives. Thus, it is a measure of how many samples belonging to the selected class were predicted correctly as positives.

A classifier with high recall but low precision finds most members of positives (thus members of the class), but at the cost of also classifying many negatives as member of the class. A classifier with high precision but low recall is just the opposite, classifying only few samples as positives, but most of these predictions are correct. An ideal classifier with high precision and high recall will classify many samples as positive with a high accuracy.

To represent this with a single number, we compute the F1-score, the harmonic mean of precision and recall. Thus, it is a measure of the classifier's accuracy.

For the example from the confusion matrix shown in Deep Learning we get for the class 'apple' the values precision: 1.00 (= 68/(68+0+0)), recall: 0.74 (= 68/(68+21+3)), and F1-score: 0.85 (=2*(1.00*0.74)/(1.00+0.74)).

List of Operators

Infer the class affiliations for a set of images using a deep-learning-based classifier.
Clear a deep-learning-based classifier.
Clear a handle containing the results of the deep-learning-based classification.
Clear the handle of a deep-learning-based classifier training result.
Deserialize a deep-learning-based classifier.
Return the parameters of a deep-learning-based classifier.
Retrieve classification results inferred by a deep-learning-based classifier.
Return the results for the single training step of a deep-learning-based classifier.
Read a deep-learning-based classifier from a file.
Serialize a deep-learning-based classifier.
Set the parameters of a deep-learning-based classifier.
Perform a training step of a deep-learning-based classifier on a batch of images.
Write a deep-learning-based classifier in a file.