create_dl_layer_box_targets — Create a layer for generating box targets.
create_dl_layer_box_targets( : : DLLayerBoxProposal, DLLayerGTBox, DLLayerGTMask, LayerNames, InputMode, OutputModes, NumClasses, GenParamName, GenParamValue : DLLayerBoxTargetsClsTarget, DLLayerBoxTargetsClsWeight, DLLayerBoxTargetsBoxTarget, DLLayerBoxTargetsBoxWeight, DLLayerBoxTargetsNumFgInstances, DLLayerBoxTargetsAssignedIdxs, DLLayerBoxTargetsMaskWeight)
The operator create_dl_layer_box_targets creates layers for generating
box targets to be used in a box classification or box regression loss
and returns the corresponding layer handles, see below.
This layer expects several feeding input layers:
DLLayerBoxProposal: Containing the boxes for which the
targets should be computed.
DLLayerGTBox: Containing the ground truth boxes for all
images within this batch.
DLLayerGTMask (optional): Containing the ground truth
masks for all images within this batch.
This input is necessary if the model also predicts instance masks (cf.
OutputModes 'mask_weight'). Otherwise, if instance
masks are not of interest, it can be set to an empty tuple.
Depending on the OutputModes, different output layers are derived
from DLLayerBoxProposal and for
each of them a name shall be given in LayerNames. Note that if
creating a model using create_dl_model each layer of the created
network must have a unique name.
The length of LayerNames has to be the length of
OutputModes times the length of DLLayerBoxProposal.
Layers that apply to all levels and are therefore not given for every level
in particular (see the respective entry in the description of
OutputModes) are excepted from the multiplication and added on
individually.
LayerNames should be given in the order corresponding to the
output layers, thus DLLayerBoxTargetsClsTarget,
DLLayerBoxTargetsClsWeight,
DLLayerBoxTargetsBoxTarget,
DLLayerBoxTargetsBoxWeight,
DLLayerBoxTargetsNumFgInstances,
DLLayerBoxTargetsAssignedIdxs,
DLLayerBoxTargetsMaskWeight.
Example: for two levels (2,3) and OutputModes =
['cls_target', 'cls_weight', 'num_fg_instances']:
['cls_t_l2', 'cls_t_l3', 'cls_w_l2', 'cls_w_l3', 'num_fg_instances'].
The parameter InputMode determines the type of inputs expected in
DLLayerBoxProposal.
The following values are possible:
The input boxes in DLLayerBoxProposal
shall be anchors, e.g., from an anchor layer as created by
create_dl_layer_anchors. Anchors from multiple feature maps
might be given in DLLayerBoxProposal.
The input boxes in
DLLayerBoxProposal shall be box proposals, e.g., from a box
proposals layer as created by create_dl_layer_box_proposals.
Depending on OutputModes the following loss targets are computed:
The target class for each of the input boxes. A diagram that shows the rules for assigning the input boxes to the ground truth boxes or to the background is shown in the figure.
InputMode:
'anchors': The class targets are given one-hot encoded
suitable for a focal loss-layer (create_dl_layer_loss_focal).
'box_proposals': The class targets are given as class index
suitable for a softmax- followed by a cross entropy layer
(create_dl_layer_softmax,
create_dl_layer_loss_cross_entropy).
The class loss weight for each of the input
boxes. Class weights have the same shape as class targets such that they
can be used as feeding layers for the class loss together.
The class weights are set depending on the class targets (see
'cls_target' above). For foreground and background boxes, the
weight is set to 1.0, while for ignore boxes the weight is set to
0.0, such that these boxes are not considered in the loss
calculation.
If InputMode is 'box_proposals', the weights for all
boxes with zero area are set to 0.
For all boxes that are assigned to the
foreground (see 'cls_target' above) the box delta targets are
calculated as coordinate differences to the assigned ground truth boxes
such that they can be used as feeding inputs to a following loss
layer, e.g., a Huber loss layer (create_dl_layer_loss_huber).
For background or ignore-boxes the targets are set to 0.
The box delta targets depend on the 'instance_type':
'rectangle1': The box delta targets () are calculated as follows: where are the input box parameters and are the ground truth box parameters.
'rectangle2':
The box delta targets ()
are calculated as follows:
where are the input box parameters and are the
ground truth box parameters, ()
are the inside weights given by 'inside_center_weight',
'inside_dimension_weight', and 'inside_angle_weight'.
corrects the angle into the appropriate interval, which
depends on whether the direction of the object within the box is
considered. This behavior is determined by the parameter
'ignore_direction', see get_dl_model_param and below.
If 'ignore_direction' is 'false', the boxes have
orientations in the range , else in the range
.
For all boxes that are assigned to the foreground (see 'cls_target' above) the weights are set to 'center_weight' for , 'dimension_weight' for or depending on 'instance_type', and 'angle_weight' for , else to 0.0.
This output contains a scalar with
the number of foreground boxes (see 'cls_target' above)
of the whole input batch. It can be used, e.g., as a normalization value
within a consecutive focal loss layer
(create_dl_layer_loss_focal).
Note, that the same output value is given for all items within
the batch. Note also, that also for multiple anchor levels there will
be only one output layer DLLayerBoxTargetsNumFgInstances.
This output contains the index of the
assigned ground truth box for all foreground boxes (see
'cls_target' above). For all other boxes the output value is set
to -1. This mode is only available for InputMode
'box_proposals'. The output can be used to calculate mask
targets using a ROI pooling layer (create_dl_layer_roi_pooling)
on the ground truth masks.
This output contains the weights for
a consecutive mask prediction loss (see e.g.,
create_dl_layer_loss_distance). Each channel is of dimensions
'mask_width' times 'mask_height'. In each channel
where the corresponding assigned index (see 'assigned_idxs'
above) is larger or equal to 0.0 all values are set to
1.0, else to 0.0. The mask weight is also set to
0.0 if a ground truth box instance does
not contain a ground truth mask. This enables to train with datasets
where not all boxes are also annotated with instance masks.
Duplicate entries in OutputModes are ignored. If an empty list
is given, all available options are switched on.
NumClasses shall be set to the number of classes contained in the
dataset (excluding background), or to 1, if the class targets for
output mode 'cls_target' should be computed class-agnostically.
For example, this is the case in a region proposal network, that builds the
first stage of the Faster R-CNN architecture (see references below). In the
latter case, all ground truth boxes are interpreted as belonging to a single
category 'object'.
The following generic parameters GenParamName and the corresponding
values GenParamValue are supported:
Outside weight multiplier for box-angles (phi) used in output 'box_weight'.
Restriction: Only applicable for 'instance_type' 'rectangle2'.
Default: 1.0.
Determines whether the 'box_target' and 'box_weight'
outputs are class specific ('true') or not ('false').
If so, the targets and weights are only set within the depth index
that corresponds to the target class.
Restriction:
Only applicable to InputMode
'box_proposals' and if OutputModes
'box_target' is used.
Default: 'false'.
Outside weight multiplier for box-center coordinates used in output 'box_weight'.
Default: 1.0.
Outside weight multiplier for box dimensions for 'instance_type' 'rectangle1' and for 'instance_type' 'rectangle2' used in output 'box_weight'.
Default: 1.0.
Foreground negative threshold. Anchors with IoU smaller than this threshold to any ground truth box are assigned to the background. If you still want an anchor to be assigned to a foreground class, you can use 'set_weak_boxes_to_bg' (see below). See detailed explanations in the scheme above.
Default: 0.4.
Foreground positive threshold. Anchors with IoU larger than or equal to this threshold to any ground truth box are assigned to the foreground. See detailed explanations in the scheme above.
Default: 0.5.
Determines whether the boxes of type 'rectangle2' respect the direction of the object within the box:
'true': Orientation of 'rectangle2' boxes is in the range .
'false': Orientation of 'rectangle2' boxes is in the range
Restriction: Only applicable for 'instance_type' 'rectangle2'.
Default: 'false'.
Inside weight multiplier for box angles (phi) used in output 'box_target'.
Restriction: Only applicable for 'instance_type' 'rectangle2'.
Default: 1.0.
Inside weight multiplier for box-center coordinates used in output 'box_target'.
Default: 1.0.
Inside weight multiplier for box dimensions for 'instance_type' 'rectangle1' and (l1, l2) for 'instance_type' 'rectangle2' used in output 'box_target'.
Default: 1.0.
Instance type of the boxes. Possible values:
'rectangle1': axis-aligned rectangles.
'rectangle2': oriented rectangles.
Default: 'rectangle1'.
Determines whether apply_dl_model will include the output of this
layer in the dictionary DLResultBatch even without specifying this
layer in Outputs ('true') or not ('false').
Default: 'false'
Determines whether 'mask_weight' output is given class
specifically. Thus, if set to 'true', the
'mask_weight' output is given such that only
the weight in the target class depth index is set to 1.
Restriction:
Only applicable if the OutputModes
'mask_weight' is used.
Default: 'false'.
Output height of the mask weight layer for output mode 'mask_weight'.
Default: 1.
Output width of the mask weight layer for output mode 'mask_weight'.
Default: 1.
Maximum number of randomly selected targets with weights set to a value larger than 0 per batch item.
Restriction:
Only for InputMode
'box_proposals'.
Default: 256.
Target ratio of foreground versus background boxes for random box sampling. The maximum number of foreground proposals with 'cls_weight' set to 1 is 'max_num_samples' times 'ratio_num_fg'. The remaining up to 'max_num_samples' are background proposals if so many are available.
Restriction:
Only for InputMode
'box_proposals'.
Default: 0.25.
Determines whether predicted boxes need to achieve an IoU larger than 'fg_neg_thresh' in order to be potentially assigned to a ground truth box, or if they are automatically assigned to the background (see scheme above):
'true': Anchors with an IoU below 'fg_neg_thresh' are assigned to the background automatically.
'false': At least the predicted box with the highest IoU is set to foreground and thus as a positive example, independent of the IoU value.
Default: 'false'.
Certain parameters of layers created using this operator
create_dl_layer_box_targets can be set and retrieved using
further operators.
The following tables give an overview, which parameters can be set
using set_dl_model_layer_param and which ones can be retrieved
using get_dl_model_layer_param or get_dl_layer_param.
Note, the operators set_dl_model_layer_param and
get_dl_model_layer_param require a model created by
create_dl_model.
| Layer Internal Parameters | set |
get |
|---|---|---|
'input_layer' (DLLayerBoxProposal, DLLayerGTBox, DLLayerGTMask) |
x
|
|
'name' (LayerNames) |
x |
x
|
'output_layer' (DLLayerBoxTargetsClsTarget, DLLayerBoxTargetsClsWeight) |
x
|
|
| 'shape' | x
|
|
| 'type' | x
|
| Generic Layer Parameters | set |
get |
|---|---|---|
| 'angle_weight' | x |
x
|
| 'box_cls_specific' | x
|
|
| 'center_weight' | x |
x
|
| 'dimension_weight' | x
|
|
| 'fg_neg_thresh' | x |
x
|
| 'fg_pos_thresh' | x |
x
|
| 'ignore_direction' | x
|
|
'input_mode' (InputMode) |
x
|
|
| 'inside_angle_weight' | x
|
|
| 'inside_center_weight' | x
|
|
| 'inside_dimension_weight' | x
|
|
| 'is_inference_output' | x |
x
|
| 'instance_type' | x
|
|
| 'mask_cls_specific' | x |
x
|
| 'mask_height' | x |
x
|
| 'mask_width' | x |
x
|
| 'max_num_samples' | x |
x
|
'num_classes' (NumClasses) |
x |
x
|
| 'num_trainable_params' | x
|
|
| 'ratio_num_fg' | x |
x
|
| 'set_weak_boxes_to_bg' | x |
x
|
DLLayerBoxProposal (input_control) dl_layer(-array) → (handle)
Feeding layers with box proposals or anchors for which targets should be computed.
DLLayerGTBox (input_control) dl_layer → (handle)
Feeding layer with ground truth boxes.
DLLayerGTMask (input_control) dl_layer → (handle)
Feeding layer with ground truth masks (optional).
LayerNames (input_control) string(-array) → (string)
Names of the output layers.
InputMode (input_control) string → (string)
Mode of the input boxes.
Default: 'box_proposals'
List of values: 'anchors', 'box_proposals'
OutputModes (input_control) string-array → (string)
Modes that should be computed as outputs.
List of values: 'assigned_idxs', 'box_target', 'box_weight', 'cls_target', 'cls_weight', 'mask_weight', 'num_fg_instances'
NumClasses (input_control) number → (integer)
Number of classes.
Restriction:
NumClasses > 0
GenParamName (input_control) attribute.name(-array) → (string)
Generic input parameter names.
Default: []
List of values: 'angle_weight', 'box_cls_specific', 'center_weight', 'dimension_weight', 'fg_neg_thresh', 'fg_pos_thresh', 'ignore_direction', 'inside_angle_weight', 'inside_center_weight', 'inside_dimension_weight', 'instance_type', 'is_inference_output', 'mask_cls_specific', 'mask_height', 'mask_width', 'max_num_samples', 'ratio_num_fg', 'set_weak_boxes_to_bg'
GenParamValue (input_control) attribute.value(-array) → (string / integer / real)
Generic input parameter values.
Default: []
Suggested values: 'rectangle1', 'rectangle2', 'true', 'false', 0.4, 0.5, 256, 0.25, 1.0, 7, 14
DLLayerBoxTargetsClsTarget (output_control) dl_layer(-array) → (handle)
Class target layer.
DLLayerBoxTargetsClsWeight (output_control) dl_layer(-array) → (handle)
Class weight layer.
DLLayerBoxTargetsBoxTarget (output_control) dl_layer(-array) → (handle)
Box target layer.
DLLayerBoxTargetsBoxWeight (output_control) dl_layer(-array) → (handle)
Box weight layer.
DLLayerBoxTargetsNumFgInstances (output_control) dl_layer → (handle)
NumFgInstances layer.
DLLayerBoxTargetsAssignedIdxs (output_control) dl_layer → (handle)
Assigned indices layer.
DLLayerBoxTargetsMaskWeight (output_control) dl_layer → (handle)
Mask weight layer.
* Minimal example for the usage of layers
* - create_dl_layer_box_proposals
* - create_dl_layer_box_targets
* for creating and training a model to perform object detection.
*
dev_update_off ()
NumClasses := 1
AnchorAspectRatios := 1.0
AnchorNumSubscales := 1
* Define the input image layer.
create_dl_layer_input ('image', [224,224,3], [], [], DLLayerInputImage)
* Define the input ground truth box layers.
create_dl_layer_input ('bbox_row1', [1, 1, 10], ['allow_smaller_tuple'], \
['true'], DLLayerInputRow1)
create_dl_layer_input ('bbox_row2', [1, 1, 10], ['allow_smaller_tuple'], \
['true'], DLLayerInputRow2)
create_dl_layer_input ('bbox_col1', [1, 1, 10], ['allow_smaller_tuple'], \
['true'], DLLayerInputCol1)
create_dl_layer_input ('bbox_col2', [1, 1, 10], ['allow_smaller_tuple'], \
['true'], DLLayerInputCol2)
create_dl_layer_input ('bbox_label_id', [1, 1, 10], \
['allow_smaller_tuple'], ['true'], \
DLLayerInputLabelID)
create_dl_layer_class_id_conversion (DLLayerInputLabelID, \
'class_id_conversion', \
'from_class_id', \
[], [], DLLayerClassIdConversion)
* Concatenate all box coordinates.
create_dl_layer_concat ([DLLayerInputRow1, DLLayerInputCol1, \
DLLayerInputRow2, DLLayerInputCol2, \
DLLayerClassIdConversion], 'gt_boxes', 'height', \
[], [], DLLayerGTBoxes)
*
* Perform some operations on the input image to extract features.
* -> this serves as our backbone CNN here.
create_dl_layer_convolution (DLLayerInputImage, 'conv1', 3, 1, 2, 8, 1, \
'half_kernel_size', 'relu', [], [], \
DLLayerConvolution)
create_dl_layer_convolution (DLLayerConvolution, 'conv2', 3, 1, 2, 8, 1, \
'half_kernel_size', 'relu', [], [], \
DLLayerConvolution)
create_dl_layer_pooling (DLLayerConvolution, 'pool', 2, 2, 'none', \
'maximum', [], [], DLLayerPooling)
*
* Create the anchor boxes -> adapt the scale to fit the object size.
create_dl_layer_anchors (DLLayerPooling, DLLayerInputImage, 'anchor', \
AnchorAspectRatios, AnchorNumSubscales, [], \
['scale'], [8], DLLayerAnchors)
*
* Create predictions for the classification and regression of anchors.
* We set the bias such that background is a lot more likely than foreground.
PriorProb := 0.05
BiasInit := -log((1.0 - PriorProb) / PriorProb)
create_dl_layer_convolution (DLLayerPooling, 'cls_logits', 3, 1, 1, \
NumClasses, 1, 'half_kernel_size', 'none', \
['bias_filler_const_val'], \
[BiasInit], DLLayerClsLogits)
create_dl_layer_convolution (DLLayerPooling, 'box_delta_predictions', 5, 1, \
1, 4*|AnchorAspectRatios|*|AnchorNumSubscales|, \
1, 'half_kernel_size', 'none', [], [], \
DLLayerBoxDeltaPredictions)
*
* Generate the class and box regression targets for the anchors
* according to the ground truth boxes.
* -> we use inside-weights here, they also need to be set in the
* corresponding box proposals layer later.
Targets := ['cls_target', 'cls_weight', 'box_target', 'box_weight', \
'num_fg_instances']
create_dl_layer_box_targets (DLLayerAnchors, DLLayerGTBoxes, [], Targets, \
'anchors', Targets, NumClasses, \
['inside_center_weight', \
'inside_dimension_weight'], [10.0, 5.0], \
DLLayerClassTarget, DLLayerClassWeight, \
DLLayerBoxTarget, DLLayerBoxWeight, \
DLLayerNumFgInstances, _, _)
*
* We use a focal loss for the classification predictions.
create_dl_layer_loss_focal (DLLayerClsLogits, DLLayerClassTarget, \
DLLayerClassWeight, DLLayerNumFgInstances, \
'loss_cls', 1.0, 2.0, 0.25, \
'sigmoid_focal_binary', [], [], DLLayerLossCls)
* We use an L1-loss for the box deltas.
create_dl_layer_loss_huber (DLLayerBoxDeltaPredictions, DLLayerBoxTarget, \
DLLayerBoxWeight, [], 'loss_box', 1.0, 0.0, \
[], [], DLLayerLossBox)
*
* Apply sigmoid to class-predictions and compute box outputs.
* --> alternatively, we could directly apply the prediction and set the
* focal loss mode to 'focal_binary' instead of 'sigmoid_focal_binary'.
create_dl_layer_activation (DLLayerClsLogits, 'cls_probs', 'sigmoid', \
[], [], DLLayerClsProbs)
create_dl_layer_box_proposals (DLLayerClsProbs, DLLayerBoxDeltaPredictions, \
DLLayerAnchors, DLLayerInputImage, \
'anchors', ['inside_center_weight', \
'inside_dimension_weight'], [10.0, 5.0], \
DLLayerBoxProposals)
*
* Create the model.
OutputLayers := [DLLayerLossCls, DLLayerLossBox, DLLayerBoxProposals]
create_dl_model (OutputLayers, DLModelHandle)
*
* Prepare the model for using it as a detection model.
set_dl_model_param (DLModelHandle, 'type', 'detection')
ClassIDs := [2]
set_dl_model_param (DLModelHandle, 'class_ids', ClassIDs)
set_dl_model_param (DLModelHandle, 'max_overlap', 0.1)
*
* Create a sample.
create_dict (DLSample)
gen_image_const (Image, 'real', 224, 224)
gen_circle (Circle, [50., 100.], [50., 150.], [20., 20.])
overpaint_region (Image, Circle, [255], 'fill')
compose3 (Image, Image, Image, Image)
set_dict_object (Image, DLSample, 'image')
smallest_rectangle1 (Circle, Row1, Col1, Row2, Col2)
set_dict_tuple (DLSample, 'bbox_row1', Row1)
set_dict_tuple (DLSample, 'bbox_row2', Row2)
set_dict_tuple (DLSample, 'bbox_col1', Col1)
set_dict_tuple (DLSample, 'bbox_col2', Col2)
set_dict_tuple (DLSample, 'bbox_label_id', [2,2])
*
* Train the model for some iterations (heavy overfitting).
set_dl_model_param (DLModelHandle, 'learning_rate', 0.0001)
Iteration := 0
TotalLoss := 1e6
LossCls := 1e6
LossBox := 1e6
dev_inspect_ctrl ([Iteration, TotalLoss, LossCls, LossBox])
while (TotalLoss > 0.2 and Iteration < 3000)
train_dl_model_batch (DLModelHandle, DLSample, DLResult)
get_dict_tuple (DLResult, 'loss_cls', LossCls)
get_dict_tuple (DLResult, 'loss_box', LossBox)
get_dict_tuple (DLResult, 'total_loss', TotalLoss)
Iteration := Iteration + 1
endwhile
dev_close_inspect_ctrl ([Iteration, TotalLoss, LossCls, LossBox])
*
* Apply the detection model.
apply_dl_model (DLModelHandle, DLSample, [], DLResult)
*
* Display ground truth and result.
create_dict (DLDatasetInfo)
set_dict_tuple (DLDatasetInfo, 'class_ids', ClassIDs)
set_dict_tuple (DLDatasetInfo, 'class_names', ['circle'])
create_dict (WindowHandleDict)
dev_display_dl_data (DLSample, DLResult, DLDatasetInfo, \
['image', 'bbox_ground_truth', 'bbox_result'], \
[], WindowHandleDict)
stop ()
dev_close_window_dict (WindowHandleDict)
create_dl_layer_convolution,
create_dl_layer_anchors,
create_dl_layer_box_proposals
create_dl_layer_box_proposals,
create_dl_layer_loss_focal,
create_dl_layer_loss_huber
create_dl_layer_box_proposals,
create_dl_layer_loss_focal,
create_dl_layer_loss_huber
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 39, Number 6, pp. 1137--1149, 2017, doi: 10.1109/TPAMI.2016.2577031.
Deep Learning Professional