This chapter explains the general concept of the deep learning (DL) model in HALCON and the data handling.
By concept, a deep learning model in HALCON is an internal representation of a deep neural network. Each deep neural network has an architecture defining its function, i.e., the tasks it can be used for. There can be several possible network architectures for one functionality. Currently, networks for the following functionalities are implemented in HALCON as model:
3D Gripping Point Detection, see 3D Matching / 3D Gripping Point Detection.
Anomaly detection and Global Context Anomaly Detection, see Deep Learning / Anomaly Detection and Global Context Anomaly Detection.
Classification, see Deep Learning / Classification.
Deep OCR, see OCR / Deep OCR.
Multi-Label Classification, see Deep Learning / Multi-Label Classification.
Object detection and instance segmentation see Deep Learning / Object Detection and Instance Segmentation.
Semantic segmentation and edge extraction, see Deep Learning / Semantic Segmentation and Edge Extraction.
Each functionality is identified by its unique model type. For the implemented methods you can find further information about the specific workflow, data requirements, and validation measures in the corresponding chapters. Information to deep learning in general are given in the chapter Deep Learning.
In this chapter you find the information, which data a DL model needs and returns as well as how this data is transferred.
Deep Learning applications have different types of data to be distinguished. Roughly spoken these are: The raw images with possible annotations, data preprocessed in a way suitable for the model, and output data.
Before the different types of data and the entries of the specific dictionaries are explained, we will have a look how the data is connected. Thereby, symbols and colors refer to the schematic overviews given below.
In brief, the data structure for training or evaluation starts with the raw
images and their ground truth annotations (gray frames).
With the read data the following dictionaries are created:
A dictionary DLDataset
(red), which serves as database and
refers to a specific dictionary (yellow) for every input image.
The dictionary DLSample
(orange) contains the data for a sample
in the way the network can process it.
A batch of DLSample
is handed to the model in
.
For evaluation, DLSampleBatch
is returned, a tuple of
dictionaries DLResultBatch
DLResult
(dark blue), one for every sample.
They are needed to obtain the evaluation results EvaluationResult
.
For training, the training results (e.g., loss values) are returned in the
dictionary DLTrainResult
(light blue).
The most important steps concerning modifying or creating a dictionary:
reading the raw data (symbol: paper with arrow)
preprocessing the data (symbol: cogs)
training (symbol: transparent brain in an arc)
evaluation of the model (symbol: graph)
evaluation of a sample (symbol: magnifying glass)
For inference no annotations are needed.
Thus, the data structure starts with the raw images (gray frames).
The dictionary DLSample
(orange) contains the data for a sample
in the way the network can process it.
The results for a sample are returned in a dictionary DLResult
(dark blue).
The most important steps concerning modifying or creating a dictionary:
reading the raw data (symbol: paper with arrow)
preprocessing the data (symbol: cogs)
inference (symbol: brain in a circle)
evaluation of a sample (symbol: magnifying glass)
In order for the model to process the data, the data needs to follow certain conventions about what is needed and how it is given to the model. As visible from the figures above, in HALCON the data is transferred using dictionaries.
In the following we explain the involved dictionaries, how they can be created, and their entries. Thereby, we group them according to the main step of a deep learning application they are created in and whether they serve as input or output data. The following abbreviations mark for which methods the entry applies:
'Any': any method
'3D-GPD': 3D Gripping Point Detection
'AD': anomaly detection
'CL': classification
'MLC': multi-label classification
'OCR-D': Deep OCR detection component
'OCR-R': Deep OCR recognition component
'GC-AD': Global Context Anomaly Detection
'OD': object detection
In case the entry is only applicable for a certain
'instance_type'
,
the specification 'r1': 'rectangle1'
,
'r2': 'rectangle2'
is added.
For entries only applicable for instance segmentation the specification 'is' is added.
'SE': semantic segmentation
The entries only applicable for certain methods are described more extensively in the corresponding chapter.
The dataset consists of images and corresponding information. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the section “Images” below.
The information about the images and the dataset is
represented in a dictionary DLDataset
, which serves as a database.
More precisely, it stores the general information about the dataset and the
dictionaries of the individual samples collected under the key
samples
.
When the actual image data is needed, a dictionary
is created (or read if it already exists) for each image required.
The relation of these dictionaries is illustrated in the figure below.
DLSample
DLDataset
The dictionary DLDataset
serves as a database.
It stores general information about the dataset and collects the
dictionaries of the individual samples.
Thereby iconic data is not included in DLDataset
but the paths
to the respective images.
The dictionary DLDataset
is used by the training and evaluation
procedures.
It is not necessary for the model, but we highly recommend to create it.
Its necessary entries are described below.
This dictionary is either created directly when labeling your
data using the MVTec Deep Learning Tool or it is created by one of the
following method-specific procedures:
read_dl_dataset_3d_gripping_point_detection
(3D Gripping Point Detection)
read_dl_dataset_anomaly
(anomaly detection,
Global Context Anomaly Detection)
read_dl_dataset_classification
(classification)
read_dl_dataset_ocr_detection
(Deep OCR - detection component)
read_dl_dataset_ocr_recognition
(Deep OCR - recognition component)
read_dl_dataset_from_coco
(object detection with
'instance_type'
= 'rectangle1'
)
read_dl_dataset_segmentation
(semantic segmentation).
Please see the respective procedure documentation for the requirements on
the data in order to use these procedures.
In case you create DLDataset
in an other way, it has to
contain at least the entries not marked with a number in the description
below.
During the preprocessing of your dataset the respective procedures include
the further entries of the dictionary DLDataset
.
Depending on the model type, this dictionary can have the following entries:
image_dir
: AnyCommon base path to all images.
format: string
dlsample_dir
: Any [1]Common base path of all sample files (if present).
format: string
class_names
: Any except OCR-RNames of all classes that are to be distinguished.
format: tuple of strings
class_ids
: Any except OCR-RIDs of all classes that are to be distinguished (range: 0-65534).
format: tuple of integers
preprocess_param
: Any [1]All parameter values used during preprocessing.
format: dictionary
samples
: AnyCollection of sample descriptions.
format: tuple of dictionaries
normals_dir
: 3D-GPDOptional. Common base path of all normals images.
format: string
xyz_dir
: 3D-GPDCommon base path of all XYZ-images.
format: string
anomaly_dir
: AD, GC-ADCommon base path of all anomaly regions (regions indicating anomalies in the image).
format: string
class_weights
: CL, SE [1]Weights of the different classes.
format: tuple of reals
segmentation_dir
: SE, 3D-GPDCommon base path of all segmentation images.
format: string
This dictionary is directly created when labeling your data using the MVTec Deep Learning Tool. It is also created by the procedures mentioned above for reading in your data. The entries marked with [1] are added by the preprocessing procedures.
samples
The DLDataset
key samples
gets a tuple of
dictionaries as value, one for each sample in the dataset.
These dictionaries contain the information concerning an individual
sample of the dataset.
Depending on the model type, this dictionary can have the following
entries:
image_file_name
: Any
File name of the image and its path relative to image_dir
.
format: string
image_id
: AnyUnique image ID (encoding format: UINT8).
format: integer
split
: Any [2]
Specifies the assigned split subset
('train'
,'validation'
,'test'
).
format: string
dlsample_file_name
: Any [3]
File name of the corresponding dictionary
and its
path relative to DLSample
dlsample_dir
.
format: string
normals_file_name
: 3D-GPD
Optional. File name of the normals image and its path relative to
normals_dir
.
format: string
segmentation_file_name
: 3D-GPD, SE
File name of the ground truth segmentation image and its path relative
to segmentation_dir
.
format: string
xyz_file_name
: 3D-GPD
File name of the XYZ-image and its path relative to xyz_dir
.
format: string
anomaly_file_name
: AD, GC-AD
Optional. Path to region files with ground truth annotations
(relative to anomaly_dir
).
format: string
anomaly_label
: AD, GC-AD
Ground truth anomaly label on image level
(in the form of class_names
).
format: string
image_label_id
: CL
Ground truth label for the image (in the form of class_ids
).
format: tuple of integers
image_label_ids
: MLC
Ground truth labels for the image (in the form of class_ids
).
format: tuple of integers
image_id_origin
: OCR-RID of the original image the sample was extracted from.
format: integer
word
: OCR-D, OCR-RGround truth word.
format: string
bbox_label_id
: OD, OCR-D
Ground truth labels for the bounding boxes
(in the form of class_ids
).
format: tuple of integers
bbox_row1
: OD:r1 [4]Ground truth bounding boxes: upper left corner, row coordinate.
format: tuple of reals
bbox_col1
: OD:r1 [4]Ground truth bounding boxes: upper left corner, column coordinate.
format: tuple of reals
bbox_row2
: OD:r1 [4]Ground truth bounding boxes: lower right corner, row coordinate.
format: tuple of reals
bbox_col2
: OD:r1 [4]Ground truth bounding boxes: lower right corner, column coordinate.
format: tuple of reals
coco_raw_annotations
: OD:r1
Optional. It contains for every bbox_label_id
within this
image a dictionary with all raw COCO annotation information.
format: tuple of dictionaries
bbox_row
: OCR-D, OCR-R, OD:r2 [4]Ground truth bounding boxes: center point, row coordinate.
format: tuple of reals
bbox_col
: OCR-D, OCR-R, OD:r2 [4]Ground truth bounding boxes: center point, column coordinate.
format: tuple of reals
bbox_phi
: OCR-D, OCR-R, OD:r2 [4]Ground truth bounding boxes: angle phi.
format: tuple of reals
bbox_length1
: OCR-D, OCR-R, OD:r2 [4]Ground truth bounding boxes: half length of edge 1.
format: tuple of reals
bbox_length2
: OCR-D, OCR-R, OD:r2 [4]Ground truth bounding boxes: half length of edge 2.
format: tuple of reals
mask
: OD:isGround truth mask marking the instance regions.
format: tuple of regions
These dictionaries are part of DLDataset
and thus they are
created concurrently.
An exception are the entries with a mark in the table,
[2]: the procedure split_dl_dataset
adds split
,
[3]: the procedure preprocess_dl_samples
adds
dlsample_file_name
.
[4]: Used coordinates: Pixel centered, subpixel accurate coordinates.
DLSample
The dictionary
serves as input for the model.
For a batch, they are handed over as the entries of the tuple
DLSample
for DLSampleBatch
or
apply_dl_model
.
They are created out of train_dl_model_batch
DLDataset
for every sample by the
procedure gen_dl_samples
followed by
preprocess_dl_samples
.
Note, preprocess_dl_samples
will update the corresponding
dictionary.
If preprocessing is done using the standard procedure
DLSample
preprocess_dl_dataset
, the preprocessed samples are stored on
the file system. Afterwards they need to be retrieved with the
procedure read_dl_samples
.
contains the preprocessed image and, in case of training
and evaluation, all ground truth annotations.
Depending on the model type, it can have the following entries:
DLSample
anomaly_ground_truth
: AD, GC-AD
Anomaly image or region, read from anomaly_file_name
.
format: image or region
anomaly_label
: AD, GC-AD
Ground truth anomaly label on image level
(in the form of class_names
).
format: string
anomaly_label_id
: AD, GC-AD
Ground truth anomaly label ID on image level
(in the form of class_ids
).
format: integer
bbox_label_id
: OD
Ground truth labels for the image part within the bounding box
(in the form of class_ids
).
format: tuple of integers
bbox_row1
: OD:r1 [4]Ground truth bounding boxes: upper left corner, row coordinate.
format: tuple of reals
bbox_col1
: OD:r1 [4]Ground truth bounding boxes: upper left corner, column coordinate.
format: tuple of reals
bbox_row2
: OD:r1 [4]Ground truth bounding boxes: lower right corner, row coordinate.
format: tuple of reals
bbox_col2
: OD:r1 [4]Ground truth bounding boxes: lower right corner, column coordinate.
format: tuple of reals
bbox_row
: OCR-D, OD:r2 [4]Ground truth bounding boxes: center point, row coordinate.
format: tuple of reals
bbox_col
: OCR-D, OD:r2 [4]Ground truth bounding boxes: center point, column coordinate.
format: tuple of reals
bbox_phi
: OCR-D, OD:r2 [4]Ground truth bounding boxes: angle phi.
format: tuple of reals
bbox_length1
: OCR-D, OD:r2 [4]Ground truth bounding boxes: half length of edge 1.
format: tuple of reals
bbox_length2
: OCR-D, OD:r2 [4]Ground truth bounding boxes: half length of edge 2.
format: tuple of reals
image
: AnyInput image.
format: image
image_label_id
: CL
Ground truth label for the image (in the form of class_ids
).
format: integer
image_label_ids
: MLC
Ground truth labels for the image (in the form of class_ids
).
format: tuple of integers
mask
: OD:isGround truth mask marking the instance regions.
format: tuple of regions
normals
: 3D-GPD2D mappings (3-channel image)
format: image
segmentation_image
: SE, 3D-GPD
Image with the ground truth segmentations,
read from segmentation_file_name
.
format: image
weight_image
: SE [5]Image with the pixel weights.
format: image
target_orientation
: OCR-DOrientation target image for the word orientation.
format: image
target_text
: OCR-DText target image for the character detection.
format: image
target_link
: OCR-DLink target image for the connection of detected character centers to a connected word.
format: image
target_weight_orientation
: OCR-D
Weight with respect to target_orientation
.
format: image
target_weight_link
: OCR-D
Weight with respect to target_link
.
format: image
target_weight_text
: OCR-D
Weight with respect to target_text
.
format: image
word
: OCR-D, OCR-RGround truth word.
format: string
x
: 3D-GPDX-image (values need to increase from left to right).
format: image
y
: 3D-GPDY-image (values need to increase from top to bottom).
format: image
z
: 3D-GPDZ-image (values need to increase from points close to the sensor to far points; this is for example the case if the data is given in the camera coordinate system).
format: image
These dictionaries are created by the procedure
gen_dl_samples
followed by preprocess_dl_samples
.
An exception is the entry marked in the table above, [5]: created
by the procedure gen_dl_segmentation_weights
.
[4]: Used coordinates: Pixel centered, subpixel accurate coordinates.
The inference input data consists of a single
dictionary
or a tuple of such. In contrast to training and evaluation, only
the following keys are used:
DLSample
image
: AnyInput image
format: image
normals
: 3D-GPD2D mappings (3-channel image).
format: image
x
: 3D-GPDX-image (values need to increase from left to right).
format: image
y
: 3D-GPDY-image (values need to increase from top to bottom).
format: image
z
: 3D-GPDZ-image (values need to increase from points close to the sensor to far points; this is for example the case if the data is given in the camera coordinate system).
format: image
Concerning the image requirements, find more information in the subsection “Images” below.
For the inference, such a dictionary containing only the image data can be
created using the procedure gen_dl_samples_from_images
or
gen_dl_samples_3d_gripping_point_detection
(only for 3D Gripping
Point Detection). These dictionaries can be passed one at a time or within
a tuple
.
DLSampleBatch
The training output data is given in the dictionary
.
Its entries depend on the model and thus on the operator used
(for further information see the documentation of the corresponding
operator):
DLTrainResult
The operator
returns
train_dl_model_batch
total_loss
possible further losses included in your model
The operator
returns
train_dl_model_anomaly_dataset
final_error
final_epoch
As output from the operator
, the model will return
a dictionary apply_dl_model
for each sample.
An illustration is given in the figure below.
The evaluation is based on these results and the annotations.
Evaluation results are stored in the dictionary
DLResult
EvaluationResult
.
( 1) | ( 2) |
Depending on the model type, the dictionary
can have
the following entries:
DLResult
gripping_confidence
: 3D-GPDImage, containing raw, uncalibrated confidence values for every point in the scene.
format: image
gripping_map
: 3D-GPDBinary image, indicating for each pixel of the scene whether the model predicted a gripping point (pixel value = 1.0) or not (0.0).
format: image
anomaly_image
: AD, GC-ADSingle channel image whose gray values are scores, indicating how likely the corresponding pixel in the input image belongs to an anomaly.
format: image
anomaly_image_combined
: GC-AD
Single channel image whose gray values are scores, indicating how
likely the corresponding pixel in the input image belongs to an anomaly.
Calculated by combining the 'local'
and 'global'
subnetworks of the model.
Format: Bild
anomaly_image_global
: GC-AD
Single channel image whose gray values are scores, indicating how
likely the corresponding pixel in the input image belongs to an anomaly.
Calculated by the 'global'
subnetwork of the model.
format: image
anomaly_image_local
: GC-AD
Single channel image whose gray values are scores, indicating how
likely the corresponding pixel in the input image belongs to an anomaly.
Calculated by the 'local'
subnetwork of the model.
format: image
anomaly_score
: AD, GC-AD
Anomaly score on image level calculated from anomaly_image
.
format: real
anomaly_score_local
: GC-AD
Anomaly score on image level calculated from
anomaly_image_local
.
format: real
anomaly_score_global
: GC-AD
Anomaly score on image level calculated from
anomaly_image_global
.
format: real
classification_class_ids
: CLInferred class ids for the image sorted by confidence values.
format: tuple of integers
classification_class_names
: CLInferred class names for the image sorted by confidence values.
format: tuple of strings
classification_confidences
: CLConfidence values of the image inference for each class.
format: tuple of reals
class_ids
: MLCInferred class ids for the image sorted by confidence values.
format: tuple of integers
class_names
: MLCInferred class names for the image sorted by confidence values.
format: tuple of strings
confidences
: MLCConfidence values of the image inference for each class.
format: tuple of reals
selected_class_ids
: MLC
Class ids for the image selected by the confidence
threshold(min_confidence
).
format: tuple of integers
selected_class_names
: MLC
Class names for the image selected by the confidence
threshold (min_confidence
).
format: tuple of strings
selected_confidences
: MLCConfidence values of the image selected by the confidence threshold for each class.
format: tuple of reals
char_candidates
: OCR-RCandidates for each character of the word and their confidences.
format: tuple of dictionaries
word
: OCR-RRecognized word.
format: string
score_maps
: OCR-DScores given as image with four channels:
Character score: Score for the character detection.
Link score: Score for the connection of detected character centers to a connected word.
Orientation 1: Sine component of the predicted word orientation.
Orientation 2: Cosine component of the predicted word orientation.
format: image
words
: OCR-DDictionary containing the following entries. Thereby, the entries are tuples with a value for every found word.
row
: Localized word: Center point, row coordinate.
col
: Localized word: Center point, column
coordinate.
phi
: Localized word: Angle phi.
length1
: Localized word: Half length of edge 1.
length2
: Localized word: Half length of edge 2.
line_index
: Line index of localized word if
'detection_sort_by_line'
set to 'true'
.
format: dictionary with tuples of reals and strings
word_boxes_on_image
: OCR-D
Dictionary with the word
localization on the coordinate system of the preprocessed images placed in
image
. The entries are tuples with a value for every found word.
row
: Localized word: Center point, row coordinate.
col
: Localized word: Center point, column
coordinate.
phi
: Localized word: Angle phi.
length1
: Localized word: Half length of edge 1.
length2
: Localized word: Half length of edge 2.
format: dictionary with tuples of reals
word_boxes_on_score_maps
: OCR-D
Dictionary with the word
localization on the coordinate system of the score images placed in
score_maps
. The entries are the same as for
word_boxes_on_image
above.
format: dictionary with tuples of reals
bbox_class_id
: OD
Inferred class for the bounding box
(in the form of class_ids
).
format: tuple of integers
bbox_class_name
: ODName of the inferred class for the bounding box.
format: tuple of strings
bbox_confidence
: ODConfidence value of the inference for the bounding box.
format: tuple of reals
bbox_row1
: OD:r1 [6]Inferred bounding boxes: upper left corner, row coordinate.
format: tuple of reals
bbox_col1
: OD:r1 [6]Inferred bounding boxes: upper left corner, column coordinate.
format: tuple of reals
bbox_row2
: OD:r1 [6]Inferred bounding boxes: lower right corner, row coordinate.
format: tuple of reals
bbox_col2
: OD:r1 [6]Inferred bounding boxes: lower right corner, row coordinate.
format: tuple of reals
bbox_row
: OD:r2 [6]Inferred bounding boxes: center point, row coordinate.
format: tuple of reals
bbox_col
: OD:r2 [6]Inferred bounding boxes: center point, column coordinate.
format: tuple of reals
bbox_phi
: OD:r2 [6]Inferred bounding boxes: angle phi.
format: tuple of reals
bbox_length1
: OD:r2 [6]Inferred bounding boxes: half length of edge 1.
format: tuple of reals
bbox_length2
: OD:r2 [6]Inferred bounding boxes: half length of edge 2.
format: tuple of reals
mask
: OD:isInferred mask marking the instance regions.
format: tuple of regions
mask_probs
: OD:isImage with the confidence values of the inferred mask.
format: image
segmentation_image
: SEImage with the segmentation result.
format: image
segmentation_confidence
: SEImage with the confidence values of the segmentation result.
format: image
[6]: Used coordinates: Pixel centered, subpixel accurate coordinates.
For a further explanation to the output values we refer to the chapters of the respective method, e.g., Deep Learning / Semantic Segmentation and Edge Extraction.
Regardless of the application, the network poses requirements on the
images. The specific values depend on the network itself and can be
queried using
.
In order to fulfill these requirements, you may have to preprocess your
images.
Standard preprocessing of the entire dataset and therewith also the
images is implemented in get_dl_model_param
preprocess_dl_samples
.
In case of custom preprocessing this procedure offers guidance on the
implementation.
add_dl_pruning_batch
apply_dl_model
clear_dl_model
create_dl_pruning
deserialize_dl_model
gen_dl_model_heatmap
gen_dl_pruned_model
get_dl_model_param
get_dl_pruning_param
read_dl_model
serialize_dl_model
set_dl_model_param
set_dl_pruning_param
train_dl_model_batch
write_dl_model