Deep Learning

List of Operators ↓

Introduction

The term deep learning (DL) refers to a family of machine learning methods. In HALCON, the following methods are implemented:

Anomaly Detection
Assign to each pixel the likelihood that it shows an unknown feature. For further information please see the chapter Deep Learning / Anomaly Detection.
image/svg+xml
A possible example for anomaly detection: A score is assigned to every pixel of the input image, indicating how likely it shows an unknown feature, i.e., an anomaly.
Classification:
Classify an image into one class out of a given set of classes. For further information please see the chapter Deep Learning / Classification.
image/svg+xml orange: apple: lemon:
A possible example for classification: The image gets assigned to a class.
Object Detection:
Detect objects of the given classes and localize them within the image. For further information please see the chapter Deep Learning / Object Detection.
image/svg+xml 'apple' 'apple' 'lemon'
A possible example for object detection: Within the input image three instances are found and assigned to a class.
Semantic Segmentation:
Assign a class to each pixel of an image. For further information please see the chapter Deep Learning / Semantic Segmentation.
image/svg+xml apple lemon orange background edges background
Top: A possible example for semantic segmentation: Every pixel of the input image is assigned to a class. Bottom: A possible example for edge extraction: A special case of semantic segmentation, where every pixel of the input image is assigned to one of the two classes 'edge' and 'background'.

All of the deep learning methods listed above use a network for the assignment task. In HALCON they are implemented within the general DL model, see Deep Learning / Model. The model is trained by only considering the input and output, which is also called end-to-end learning. Basically, using images and the information, what is visible in them, the training algorithm adjusts the model in a way to distinguish the different classes and eventually also how to find the corresponding objects. For you, it has the nice outcome of no need for manual feature specification. Instead you have to select and collect appropriate data.

System Requirements

For Deep learning additional prerequisites apply. Please see the requirements listed in the HALCON “Installation Guide”, paragraph “Requirements for Deep Learning and Deep-Learning-Based Methods”.

To speed up the training process, we recommend in HALCON to use a sufficiently fast hard drive. Thus, a solid-state drive (SSD) is preferable to conventional hard disk drives (HDD).

General Workflow

As the DL methods mentioned above differ in what they do and how they need the data, you need to know which method is most appropriate for your specific task. Once this is clear, you need to collect a suitable amount of data, meaning images and the information needed by the method. After that, there is a common general workflow for all these DL methods:

Prepare the Network and the Data

The network needs to be prepared for your task and your data adapted to the specific network.

Train the Network and Evaluate the Training Progress

Once your network is set up and your data prepared it is time to train the network for your specific task.

Apply and Evaluate the Final Network

Your network is trained for your task and ready to be applied. But before deploying it in the real world you should evaluate how well the network performs on basis of your test dataset.

Inference Phase

When your network is trained and you are satisfied with its performance, you can use it for inference on new images. Thereby the images need to be preprocessed according to the requirements of the network (thus, in the same way as for training).

Data

The term 'data' is used in the context of deep learning as the images and the information, what is in them. This last information has to be provided in a way the network can understand. Not surprisingly, the different DL methods have their own requirements concerning what information has to be provided and how. Please see the corresponding chapters for the specific requirements.

The network further poses requirements on the images regarding the image dimensions, the gray value range, and the type. The specific values depend on the network itself and can be queried with get_dl_model_paramget_dl_model_paramGetDlModelParamGetDlModelParamGetDlModelParamget_dl_model_param. Additionally, depending on the method there are also requirements regarding the information as e.g., the bounding boxes. To fulfill all these requirements, the data may have to be preprocessed, which can be done most conveniently with the corresponding procedure preprocess_dl_samples.

When you train your network, the network gets adapted to its task. But at one point you will want to evaluate what the network learned and at an even later point you will want to test the network. Therefore the dataset will be split into three subsets which should be independent and identically distributed. In simple words, the subsets should not be connected to each other in any way and each set contains for every class the same distribution of images. This splitting is conveniently done by the procedure split_dl_dataset. The clearly largest subset will be used for the retraining. We refer to this dataset as the training dataset. At a certain point the performance of the network is evaluated to check whether it is beneficial to continue the network optimization. For this validation the second set of data is used, the validation dataset. Even if the validation dataset is disjoint from the first one, it has an influence on the network optimization. Therefore to test the possible predictions when the model is deployed in the real world, the third dataset is used, the test dataset. For a representative network validation or evaluation, the validation and test dataset should have statistically relevant data, which gives a lower bound on the amount of data needed.

Note also, that for training the network, you best use representative images, i.e., images like the ones you want to process later and not only 'perfect' images, as otherwise the network may have difficulties with non-'perfect' images.

The Network and the Training Process

In the context of deep learning, the assignments are performed by sending the input image through a network. The output of the total network consists of a number of predictions. Such predictions are e.g., for a classification task the confidence for each class, expressing how likely the image shows an instance of this class.

The specific network will vary, especially from one method to another. Some methods like e.g., object detection, use a subnetwork to generate feature maps (see the explanations given below and in Deep Learning / Object Detection). Here, we will explain a basic Convolutional Neural Network (CNN). Such a network consists of a certain number of layers or filters, which are arranged and connected in a specific way. In general, any layer is a building block performing specific tasks. It can be seen as a container, which receives input, transforms it according to a function, and returns the output to the next layer. Thereby different functions are possible for different types of layers. Several possible examples are given in the “Solution Guide on Classification”. Many layers or filters have weights, parameters which are also called filter weights. These are the parameters modified during the training of a network. The output of most layers are feature maps. Thereby the number of feature maps (the depth of the layer output) and their size (width and height) depends on the specific layer.

image/svg+xml apple lemon orange
Schema of an extract of a possible classification network. Below we show feature maps corresponding to the layers, zoomed to a uniform size.

To train a network for a specific task, a loss function is added. There are different loss functions depending on the task, but they all work according to the following principle. A loss function compares the prediction from the network with the given information, what it should find in the image (and, if applicable, also where), and penalizes deviations. Now the filter weights are updated in such a way that the loss function is minimized. Thus, training the network for the specific tasks, one strives to minimize the loss (an error function) of the network, in the hope of doing so will also improve the performance measure. In practice, this optimization is done by calculating the gradient and updating the parameters of the different layers (filter weights) accordingly. This is repeated by iterating multiple times over the training data.

There are additional parameters that influence the training, but which are not directly learned during the regular training. These parameters have values set before starting the training. We refer to this last type of parameters as hyperparameters in order to distinguish them from the network parameters that are optimized during training. See the section “Setting the Training Parameters: The Hyperparameters”.

To train all filter weights from scratch a lot of resources are needed. Therefore one can take advantage from the following observation. The first layers detect low level features like edges and curves. The feature map of the following layers are smaller, but they represent more complex features. For a large network, the low level features are general enough so the weights of the corresponding layers will not change much among different tasks. This leads to a technique called transfer learning: One takes an already trained network and retrains it for a specific task, benefiting from already quite suitable filter weights for the lower layers. As a result, considerably less resources are needed. While in general the network should be more reliable when trained on a larger dataset, the amount of data needed for retraining also depends on the complexity of the task. A basic schema for the workflow of transfer learning is shown with the aid of classification in the figure below.
image/svg+xml ... ... image/svg+xml ... ... 'apple' 'lemon' 'lemon' 'lemon' 'lemon' 'apple' 'apple' 'apple' image/svg+xml ... ... image/svg+xml ... ... '?' 'apple' 'lemon' 0.9 0.1
(1) (2) (3) (4)
Basic schema of transfer learning with the aid of classification. (1) A pretrained network is read. (2) Training phase, the network gets trained with the training data. (3) The trained model with new capabilities. (4) Inference phase, the trained network infers on new images.

Setting the Training Parameters: The Hyperparameters

The different DL methods are designed for different tasks and will vary in the way they are built up. They all have in common that during the training of the model one faces a minimization problem. Training the network or subnetwork, one strives to minimize an appropriate loss function, see the section “The Network and the Training Process”. For doing so, there is a set of further parameters which is set before starting the training and not optimized during the training. We refer to these parameters as hyperparameters. For a DL model, you can set a change strategy, specifying when and how you want these hyperparameters changed during the training. In this section, we explain the idea of the different hyperparameters. Note, that certain methods have additional hyperparameters, you find more information in their respective chapter.

As already mentioned, the loss compares the predictions from the network with the given information about the content of the image. The loss now penalizes deviations. Training the network means updating the filter weights in such a way, that the loss has to penalize less, thus the loss result is optimized. To do so, a certain amount of data is taken from the training dataset. For this subset the gradient of the loss is calculated and the network modified in updating its filter weights accordingly. Now this is repeated with the next subset of data till the whole training data is processed. These subsets of the training data are called batches and the size of these subsets, the 'batch_size'"batch_size""batch_size""batch_size""batch_size""batch_size", determines the number of data taken into a batch and as a consequence processed together.

A full iteration over the entire training data is called epoch. It is beneficial to iterate several times over the training data. The number of iterations is defined by 'epochs'. Thus, 'epochs' determines how many times the algorithm loops over the training set.

Some models (e.g., anomaly detection) train utilizing the whole dataset at once. For other models, the dataset is processed batch-wise and in order to do so, the stochastic gradient descent algorithm SGD is used. This involves further parameters, which are explained in the following. After every calculation of the loss gradient the filter weights are updated. For this update, there are two important hyperparameters: The 'learning_rate'"learning_rate""learning_rate""learning_rate""learning_rate""learning_rate" , determining the weight of the gradient on the updated loss function arguments (the filter weights), and the 'momentum'"momentum""momentum""momentum""momentum""momentum" within the interval , specifying the influence of previous updates. More information can be found in the documentation of train_dl_model_batchtrain_dl_model_batchTrainDlModelBatchTrainDlModelBatchTrainDlModelBatchtrain_dl_model_batch. In simple words, when we update the loss function arguments, we still remember the step we took for the last update. Now, we take a step in direction of the gradient with a length depending to the learning rate; additionally we repeat the step we did last times, but this time only times as long. A visualization is given in the figure below. A too large learning rate might result in divergence of the algorithm, a very small learning rate will take unnecessarily many steps. Therefore, it is customary to start with a larger learning rate and potentially reduce it during training. With a momentum , the momentum method has no influence, so only the gradient determines the update vector.
image/svg+xml k+1 k+1 k-1 k k μv v λg v v