ClassesClasses | | Operators

Classification

List of Operators ↓

This chapter contains operators for classification based on deep learning, both for the training and inference phases.

The term deep learning refers to a family of machine learning methods, used for, among other tasks, classification. Classification tasks require knowledge of features to distinguish the classes. In deep learning the network is trained by only considering the input and output, which is also called end-to-end learning. Basically, using labeled images provided the training algorithm adjusts the network in a way to distinguish the classes properly. For you, it has the nice outcome of no need for manual feature specification. Instead you have to select and collect appropriate data. Convolutional neural networks (CNNs) are a deep learning method suitable for image processing. For a brief introduction, please refer to the “Solution Guide on Classification”.

image/svg+xml Input ... ... ... ... Hidden Layers Input Layer Output Layer Output
A neural network as used for deep learning consists of multiple layers, potentially a huge number, which led to the name 'deep' learning. More information can be found in the “Solution Guide on Classification”.

To make this technique usable, three technology enablers are necessary.

Operator Workflow

Have a look at the HDevelop example examples/hdevelop/Deep-Learning/Classification/classify_fruit_deep_learning.hdev for a short-and-simple overview and classify_pill_defects_deep_learning.hdev for a more sophisticated workflow. They provide great guidance on how the different parts can be used together.

Prepare the Network and the Data

  1. First, a pretrained network has to be read using the operator

    This operator is used as well when you want to read your own trained networks, after you saved them with write_dl_classifierwrite_dl_classifierWriteDlClassifierWriteDlClassifierWriteDlClassifier.

  2. You need to know which problem the classifier shall solve, i.e., which classes are to be distinguished and what such samples look like. This is represented by your data set, i.e., your images with the corresponding ground truth labels. To read the data for your deep learning classification training the procedure

    • read_dl_classifier_data_set

    is available. Using this procedure you can get a list of image file paths and their respective labels (the ground truth labels) as well as a list of the unique classes, to which at least one of the listed images belongs.

  3. The network you selected imposes several requirements on the images to be classified. These requirements (for example the image size and gray value range) can be retrieved with

    The procedure preprocess_dl_classifier_images provides great guidance on how to implement such a preprocessing stage. We recommend to preprocess and store all images used for the training before starting the classifier training, since this speeds up the training significantly.

  4. Next, we recommend to split the data set into three distinct data sets which are used for training, validation, and testing. This can be achieved using the procedure

    • split_dl_classifier_data_set.

    Following this approach, you use the training data set as direct input for the training and the validation data set to test your classifier during training and to monitor your training. The test data set is used for an independent evaluation after training.

  5. You need to specify the 'classes'"classes""classes""classes""classes" (determined before by use of read_dl_classifier_data_set) you want to differentiate with your classifier. For this, the operator

    is available.

    This operator can also be used to set hyperparameters, which are important for training, e.g. 'batch_size'"batch_size""batch_size""batch_size""batch_size", and 'learning_rate'"learning_rate""learning_rate""learning_rate""learning_rate". For a more detailed explanation, see this chapter reference below and the documentation of set_dl_classifier_paramset_dl_classifier_paramSetDlClassifierParamSetDlClassifierParamSetDlClassifierParam.

Train the Network and Evaluate the Training Progress

Once your network is set up and your data prepared it is time to train the classifier for your specific task.

  1. To train the classifier the operator

    is available. The intermediate training results are stored in the output handle. To clear this handle and free the allocated memory after the training use clear_dl_classifier_train_resultclear_dl_classifier_train_resultClearDlClassifierTrainResultClearDlClassifierTrainResultClearDlClassifierTrainResult.

    As the name of train_dl_classifier_batchtrain_dl_classifier_batchTrainDlClassifierBatchTrainDlClassifierBatchTrainDlClassifierBatch indicates, this operator processes a batch of data (images and ground truth labels) at once. Thereby, a batch is a number of data according to the 'batch_size'"batch_size""batch_size""batch_size""batch_size" which is set with set_dl_classifier_paramset_dl_classifier_paramSetDlClassifierParamSetDlClassifierParamSetDlClassifierParam. For a successful training, we want to use all of our training images. Thus, we iterate through our training data and use it to train the classifier successively with train_dl_classifier_batchtrain_dl_classifier_batchTrainDlClassifierBatchTrainDlClassifierBatchTrainDlClassifierBatch. When all images have been processed once, we call this an epoch (note however that training batches need to be 'filled' and thus, it may happen that some images are unused in this epoch when the number of images left is not enough to 'fill' a batch). We can repeat this process multiple times, iterating over so many training epochs until we are satisfied with the training result.

  2. To know how well the classifier learns the new task, the procedure

    • plot_dl_classifier_training_progress

    is provided. With it you can plot the classification errors during training. To compute the input required for the plotting procedure, the procedures

    • select_percentage_dl_classifier_data,

    • apply_dl_classifier_batchwise, and

    • evaluate_dl_classifier

    are available. With them you can reduce the number of images used for this classification evaluation, apply the classifier on the selected data, and compute, for example, the top-1 error. Have a look at the HDevelop example classify_pill_defects_deep_learning.hdev to see how these procedures can be used together.

Apply and Evaluate the Final Classifier

Your classifier is trained for your task and ready to be applied. But before deploying in the real world you should evaluate how well the classifier performs on basis of your test data.

  1. To apply the classifier on a set containing an arbitrary number of images, use the operator

    The runtime of this operator depends on the number of batches needed for the given image set.

    The results are returned in a handle. Do not forget to clear this handle and free the allocated memory at the end using clear_dl_classifier_resultclear_dl_classifier_resultClearDlClassifierResultClearDlClassifierResultClearDlClassifierResult.

    To retrieve the predicted classes and confidences, use the operator

  2. Now it is time to evaluate these results. The performance of the classifier can be evaluated as during the training with evaluate_dl_classifier.

    To visualize and analyze the classifier quality, the confusion matrix is a useful tool (see below for an explanation). For this, you can use the procedure

    • gen_confusion_matrix.

    Additionally, after applying the classifier on a set of data, you can use the procedure

    • get_dl_classifier_image_results

    to display and return images according to certain criteria, e.g., wrongly classified ones. Then, you might want to use this input for the procedure

    • gen_dl_classifier_heatmap,

    to obtain a heatmap of the input image, with which you can analyze which regions of the image are relevant for the classification result.

Inference Phase

When your classifier is trained and you are satisfied with its performance, you can use it to classify new images. For this, you simply preprocess your images according to the network requirements (i.e., the same way as you did for your data set used for training the classifier) and apply the classifier using

System Requirements

Deep learning with CNNs is implemented to run on NVIDIA GPUs and uses the libraries cuDNN and cuBLAS. For the specific requirements please refer to the HALCON Installation Guide.

To speed up the training process, we recommend to use a sufficiently fast hard drive. Thus, a solid-state drive (SSD) is preferable to conventional hard disk drives (HDD).

Data for the Training

In HALCON, we use a technique called transfer learning (see also above). Hence, we provide pretrained networks, representing classifiers which have been trained on huge amounts of classified image data. These classifiers have been trained and tested to perform well on industrial image classification tasks. One of these classifiers, already trained for general classifications, is now retrained for your specific task to classify your data into your classes. For this, you still need a suitable data set of labeled images, generally in the order of hundreds to thousands per class. While in general the network should be more reliable when trained on a larger data set, the amount of data needed for training also depends on the complexity of the task.

For training, the data set will be split into three subsets which should be independent and identically distributed. In simple words, the subsets should not be connected to each other in any way and each set contains for every class the same distribution of images. This splitting is conveniently done by the procedure split_dl_classifier_data_set. The clearly largest subset will be used for the retraining. At a certain point the performance of the classifier is evaluated to check whether it is beneficial to continue the network optimization. For this validation the second set of data is used. Even if this second set is disjoint from the first one, it has an influence on the network optimization. Therefore to test the possible predictions when the model is deployed in the real world, the third data set is used. For a representative network validation, the latter two sets should have statistically relevant data, which gives a lower bound on the amount of data needed.

The network can only be applied to images of a given size, number of channels, gray range and type. These requirements depend on the network itself and can be queried with read_dl_classifierread_dl_classifierReadDlClassifierReadDlClassifierReadDlClassifier. Also the images used during the training have to be labeled, which means that they are classified. Note also, that for training the network, you best use representative images, i.e., images like the ones you want to classify later and not only `perfect' images, as otherwise the classifier may have difficulties with non-perfect images.

Setting the Training Parameters

The neural network describes a function to map the input data onto classes. Such a network consists of layers and related weights, whereby latter ones are the free parameters altered during the training. Learning is not a pure optimization problem: Machine learning usually acts indirectly. This means, we want an accurate prediction, but we do not directly optimize the mapping function predicting the classes. Instead, the loss function is introduced, a function penalizing the deviation between the predicted and true classes. The loss function is now optimized, in the hope that doing so will also improve our performance measure.

Hence training the network for the specific classification tasks, one strives to minimize the loss (an error function) of the mapping function. In practice this optimization is done calculating the gradient and updating the parameters (weights) accordingly and iterating multiple times over the training data.

The whole data set is generally too large to fit into GPU RAM. To circumvent this problem, the training set is divided into relatively small subsets of the training data, which are ideally chosen randomly. Such a subset is called batch (or minibatch). The batch size determines the number of data taken into a batch and as a consequence processed simultaneously. Larger batches should provide more accurate estimates of the gradient, but they also requires more memory.

For the optimization of the loss, the gradient is computed (via the stochastic gradient descent algorithm SGD) for each batch. There are two hyperparameters important for the update of the function mapping the input data onto classes: The 'learning_rate'"learning_rate""learning_rate""learning_rate""learning_rate" , determining the weight of the gradient on the updated loss function arguments, and the 'momentum'"momentum""momentum""momentum""momentum" within the interval , specifying the influence of previous updates. More information can be found in the operator reference of train_dl_classifier_batchtrain_dl_classifier_batchTrainDlClassifierBatchTrainDlClassifierBatchTrainDlClassifierBatch. In simple words, when we update the loss function arguments, we still remember the step we did for the last update. Now, we go a step in direction of the gradient with a length according to the learning rate; additionally we repeat the step we did last times, but this time only times as long. A visualization is given in the figure below. A too large learning rate might result in divergence of the algorithm, a very small learning rate will take unnecessarily many steps. Therefore, it is customary to start with a larger learning rate and eventually reduce it during training. With a momentum , the momentum method has no influence and so only the gradient will determine the update vector.

Sketch of the actual step: the update vector v (v - solid lines), the gradient step: the learning rate times the gradient g ( g - dashed lines), and the momentum step: the momentum times the previous update vector v ( v - dotted lines). The superscript index notes the iteration.

Regularization, set over the hyperparameter 'weight_prior'"weight_prior""weight_prior""weight_prior""weight_prior", can be used to reduce overfitting to the training data (see the part 'Risk of Underfitting and Overfitting' below). It has to be set prior to training. Choosing its value is a trade-off between the model's ability to generalize, overfitting, and underfitting. If 'weight_prior'"weight_prior""weight_prior""weight_prior""weight_prior" is too small the model might overfit, if it is too large, the model might loose its ability to fit the data because all weights are effectively zero.

The number of epochs defines how many times the whole training set is used for the training. It determines how many times the algorithm loops over the training set. Note however that training batches need to be 'filled' and as a consequence it may happen that some images are unused in this epoch when the number of images left is not enough to 'fill' a batch.

With the training data and all the hyperparameters, there are many different influences on the outcome of such complex algorithms. To improve the classifier performance, generally also adding training data helps. Please note, whether to gather more data is a good solution always depends also on how easily one can do so. Usually, a small additional fraction will not noticeably change the total performance.

Interpreting the Classification Results

Evaluation During Training

When it comes to the evaluation of the classifier performance, it is important to note, this is not a pure optimization problem (see the part 'Setting the Training Parameters' above).

In order to observe the training progress, it is usually helpful to visualize the error, meaning the error over the samples of a batch. As the samples differ, some are easier to classify than others. Thus it may be that the network performs better or worse for the samples of a given batch than for the samples of another batch. So it is normal that the error is not changing smoothly over the iterations. But in total it should decrease. Adjusting the hyperparameters 'learning_rate'"learning_rate""learning_rate""learning_rate""learning_rate" and 'momentum'"momentum""momentum""momentum""momentum" can help to decrease the error again. The following figure shows a possible example.

image/svg+xml decreased learning rate
Sketch of errors during training.

Risk of Underfitting and Overfitting

Underfitting occurs if the model is not able to describe the complexity of the task. It is directly reflected in the error on the training set which stays high.

Overfitting happens when the network starts to "memorize" training data instead of learning how to generalize. This is shown by an error on the training set staying low or even continuing to decrease while the error on the validation set increases. In such a case, regularization may help. Regularization is a technique to prevent neural networks from overfitting by adding an extra term to the loss function. It works by penalizing large weights, i.e., pushing the weights towards zero. Simply put, regularization favors simpler models that are less likely to fit to noise in the training data and generalize better. Note that a similar phenomenon occurs when the model capacity is too high with respect to the data.

image/svg+xml generalization gap
Sketch of a possible overfitting, visible from the generalization gap.

Confusion Matrix, Precision, Recall, and F-score

A confusion matrix is a table layout that can be used to visualize the performance of a classifier on a set of known data, that is you already know for each image into which class it belongs. The table shows whereto the classifier predicted the images (class of highest confidence value), thus how many images of a given class have been predicted into which class. E.g., it shows how many images with ground truth class affiliation 'apple' have been classified as 'apple' and how many have been classified as 'peach' or 'orange'. Thereby we represent for each class the instances with this ground truth label in a column and the instances predicted to belong to this class in a row. This table makes it easy to see how well the network performs for each class. For classification tasks with more than two classes, you can either include every class in the table, or reduce it into a binary problem (positive: belongs to the selected class vs negative: belongs to another class). For binary classification measures the confusion matrix reduces to the following four entries: True positives (TP: predicted positive, labeled positive), true negatives (TN: predicted negative, labeled negative), false positives (FP: predicted positive, labeled negative), and false negatives (FN: predicted negative, labeled positive).

image/svg+xml image/svg+xml TP FP FN TN
(1) (2)
(1) On the left, a confusion matrix for three objects can be seen. It appears as if the network 'confuses' apples and peaches more than all other combinations. On the right (2), this confusion matrix is reduced to a binary problem to better visualize the 'apple' class. We see that 68 images of an 'apple' have been classified as such (TP), 60 images showing not an 'apple' have been correctly classified as a 'peach' (30) or 'pear' (30) (TN), 0 images show a 'peach' or a 'pear' but have been classified as an 'apple' (FP) and 24 images of an 'apple' have wrongly been classified as 'peach' (21) or 'pear' (3) (FN).

Various values can be derived from this confusion matrix: The precision is the proportion of all correct predicted positives to all predicted positives (true and false ones). Thus, it is a measure of how many positive predictions really belong to the selected class.

The recall, also called the "true positive rate", is the proportion of all correct predicted positives to all real positives. Thus, it is a measure of how many samples belonging to the selected class were predicted correctly as positives.

A classifier with high recall but low precision finds most members of positives (thus members of the class), but at the cost of also classifying many negatives as member of the class. A classifier with high precision but low recall is just the opposite, classifying only few samples as positives, but most of these predictions are correct. An ideal classifier with high precision and high recall will classify many samples as positive with a high accuracy.

To represent this with a single number, we compute the F1-score, the harmonic mean of precision and recall. Thus, it is a measure of the classifier's accuracy.

For the example from the confusion matrix shown above on the left we get for the class 'Apple' precision: 1.00 (= 68/(68+0+0)), recall: 0.74 (= 68/(68+21+3)), F1-score: 0.85 (=2*(1.00*0.74)/(1.00+0.74)).

Glossary

In the following, we describe the most important terms used in the context of deep learning:

batch size - hyperparameter 'batch_size'"batch_size""batch_size""batch_size""batch_size"

The data set is divided into smaller subsets of data, which are called batches. The batch size determines the number of images taken into a batch and thus processed simultaneously.

class

Classes are discrete categories (e.g., 'apple', 'peach', 'pear') that the network distinguishes. In HALCON, the class of an image is given by its appropriate label.

classifier

In the context of deep learning we refer to the term classifier as follows. The classifier takes an image as input and returns the inferred confidence values, expressing how likely the image belongs to every distinguished class. E.g., the three classes 'apple', 'peach', and 'pear' are distinguished. Now we give an image of an apple to the classifier. As a result, the confidences 'apple': 0.92, 'peach': 0.07, and 'pear': 0.01 could be returned.

confidence

Confidence is a number expressing the affinity of an image to a class. In HALCON the confidence is the probability, given in the range of [0,1].

confusion matrix

A confusion matrix is a table which compares the classifications (of highest confidence value) of the classifier with the ground truth class affiliations (labels). It is often used to visualize the performance of the classifier on a validation or test set, see, e.g., the corresponding figure.

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks are networks used in deep learning, characterized by the presence of at least one convolutional layer in the network. They are particularly successful for image classification.

data

We use the term data in the context of deep learning for items to be classified (e.g., images) and their appropriate classes (the labels).

data set: training, validation, and test set

With data set we refer to the complete set of data (image and respective class) used for a training. The data set is split into three, if possible disjoint, subsets:

deep learning

The term "deep learning" was originally used to describe the training of neural networks with multiple hidden layers. Today it is rather used as a generic term for several different concepts in machine learning. In HALCON, we use the term deep learning for a family of methods using a neural network with multiple hidden layers.

epoch

In the context of deep learning, an epoch is a single training iteration over the entire training data, i.e., over all batches. Iterations over epochs should not be confused with the iterations over single batches (e.g., within an epoch).

errors

In the context of deep learning, we refer to error when the inferred class of an image does not match the real class (the ground truth label). Within HALCON, we use the term error in deep learning when we refer to the top-1 error.

hyperparameter

Like every machine learning model, CNNs contain many formulas with many parameters. During training the model learns from the data in the sense of optimizing the parameters. However, such models can have other, additional parameters, which are not directly learned during the regular training. These parameters have values set before starting the training. We refer to this last type of parameters as hyperparameters in order to distinguish them from the network parameters that are optimized during training. Or from another point of view, hyperparameters are solver-specific parameters.

Prominent examples are the initial learning rate or the batch size.

inference phase

The inference phase is the stage when a trained network is applied to predict (infer) input images. Unlike during the training phase, the network is not changed anymore in the inference phase.

label

Labels are arbitrary strings used to define the class of an image. In HALCON these labels are given by the image name (eventually followed by a combination of underscore and digits) or by the folder name, e.g., 'apple_01.png', 'pear.png', 'peach/01.png'.

layer and hidden layer

A layer is a building block in a neural network, thus performing specific tasks (e.g., convolution, pooling, etc., for further details we refer to the “Solution Guide on Classification”). It can be seen as a container, which receives weighted input, transforms it, and returns the output to the next layer. Input and output layers are connected to the data set, i.e., the images or the labels, respectively. All layers in between are called hidden layers.

learning rate - hyperparameter 'learning_rate'"learning_rate""learning_rate""learning_rate""learning_rate"

The learning rate is the weighting, with which the gradient (see the entry for the stochastic gradient descent SGD) is considered when updating the arguments of the loss function. In simple words, when we want to optimize a function, the gradient tells us the direction in which we shall optimize and the learning rate determines how far along this direction we step.

Alternative names: , step size

loss

In image classification, the loss function measures the compatibility between the provided class (ground truth label) and the prediction (class with highest probability) from the classifier. Thus, the loss function penalizes the network for predictions dissimilar from the ground truth label. This loss function is the function we optimize during the training process to adapt the network to a specific classification task.

Alternative names: objective function, cost function, utility function

momentum - hyperparameter 'momentum'"momentum""momentum""momentum""momentum"

The momentum is used for the optimization of the loss function arguments. When the loss function arguments are updated (after having calculated the gradient), a fraction of the previous update vector (of the past iteration step) is added. This has the effect of damping oscillations. We refer to the hyperparameter as momentum. When is set to , the momentum method has no influence. In simple words, when we update the loss function arguments, we still remember the step we did for the last update. Now we go a step in direction of the gradient with a length according to the learning rate and additionally we repeat the step we did last time, but this time only times as long.

overfitting

Overfitting happens when the network starts to 'memorize' training data instead of learning how to find general rules for the classification. This becomes visible when the model continues to minimize error on the training set but the error on the validation set increases. Since most neural networks have a huge amount of weights, these networks are particularly prone to overfitting.

regularization - hyperparameter 'weight_prior'"weight_prior""weight_prior""weight_prior""weight_prior"

Regularization is a technique to prevent neural networks from overfitting by adding an extra term to the loss function. It works by penalizing large weights, i.e., pushing the weights towards zero. Simply put, regularization favors simpler models that are less likely to fit to noise in the training data and generalize better. In HALCON, regularization is controlled via the parameter 'weight_prior'"weight_prior""weight_prior""weight_prior""weight_prior".

Alternative names: regularization parameter, weight decay parameter, (note that in HALCON we use for the learning rate and within formulas the symbol for the regularization parameter).

retraining

We define retraining as updating the weights of an already pretrained network, i.e., during retraining the network learns the specific task.

Alternative names: fine-tuning.

solver

The solver optimizes the network by updating the weights in a way to optimize (i.e., minimize) the loss.

stochastic gradient descent (SGD)

SGD is an iterative optimization algorithm for differentiable functions. In deep learning we use this algorithm to calculate the gradient to optimize (i.e., minimize) the loss function. A key feature of the SGD is to calculate the gradient only based on a single batch containing stochastically sampled data and not all data.

top-k error

The classifier infers for a given image class confidences of how likely the image belongs to every distinguished class. Thus, for an image we can sort the predicted classes according to the confidence value the classifier assigned. The top-k error tells the ratio of predictions where the ground truth class is not within the k predicted classes with highest probability. In the case of top-1 error, we check if the target label matches the prediction with the highest probability. In the case of top-3 error, we check if the target label matches one of the top 3 predictions (the 3 labels getting the highest probability for this image).

Alternative names: top-k score

transfer learning

Transfer learning refers to the technique where a network is build upon the knowledge of an already existing network. In concrete terms this means taking an already (pre)trained network with its weights and adapt the output layer to the respective application to get your network. In HALCON, we also see the following retraining step as a part of transfer learning.

underfitting

Underfitting occurs when the model over-generalizes. In other words it is not able to describe the complexity of the task. This is directly reflected in the error on the training set, which does not decrease significantly.

weights

In general weights are the free parameters of the network, which are altered during the training due to the optimization of the loss. A layer with weights multiplies or adds them with its input values. In contrast to hyperparameters, weights are optimized and thus changed during the training.


List of Operators

apply_dl_classifierApplyDlClassifierApplyDlClassifierapply_dl_classifier
Infer the class affiliations for a set of images using the deep-learning-based classifier.
clear_dl_classifierClearDlClassifierClearDlClassifierclear_dl_classifier
Clear a deep-learning-based classifier.
clear_dl_classifier_resultClearDlClassifierResultClearDlClassifierResultclear_dl_classifier_result
Clear the handle containing the results of the deep-learning-based classification.
clear_dl_classifier_train_resultClearDlClassifierTrainResultClearDlClassifierTrainResultclear_dl_classifier_train_result
Clear the handle of a deep-learning-based classifier training result.
deserialize_dl_classifierDeserializeDlClassifierDeserializeDlClassifierdeserialize_dl_classifier
Deserialize a deep-learning-based classifier.
get_dl_classifier_paramGetDlClassifierParamGetDlClassifierParamget_dl_classifier_param
Return the parameters the deep-learning-based classifier.
get_dl_classifier_resultGetDlClassifierResultGetDlClassifierResultget_dl_classifier_result
Retrieve classification results inferred by a deep-learning-based classifier.
get_dl_classifier_train_resultGetDlClassifierTrainResultGetDlClassifierTrainResultget_dl_classifier_train_result
Return the results for the single training step of a deep-learning-based classifier.
read_dl_classifierReadDlClassifierReadDlClassifierread_dl_classifier
Read a deep-learning-based classifier from a file.
serialize_dl_classifierSerializeDlClassifierSerializeDlClassifierserialize_dl_classifier
Serialize a deep-learning-based classifier.
set_dl_classifier_paramSetDlClassifierParamSetDlClassifierParamset_dl_classifier_param
Set the parameters of the deep-learning-based classifier.
train_dl_classifier_batchTrainDlClassifierBatchTrainDlClassifierBatchtrain_dl_classifier_batch
Perform a training step of a deep-learning-based classifier on a batch of images.
write_dl_classifierWriteDlClassifierWriteDlClassifierwrite_dl_classifier
Write a deep-learning-based classifier in a file.

ClassesClasses | | Operators