create_dl_layer_loss_ctc
— Create a CTC loss layer.
create_dl_layer_loss_ctc( : : DLLayerInput, DLLayerInputLengths, DLLayerTarget, DLLayerTargetLengths, LayerName, GenParamName, GenParamValue : DLLayerLossCTC)
The operator create_dl_layer_loss_ctc
creates a Connectionist
Temporal Classification (CTC) loss layer whose handle is returned in
DLLayerLossCTC
.
See the reference cited below for information about the CTC loss.
With this loss layer it is possible to train sequence to sequence models
(Seq2Seq). E.g., it can be used to train a model that is able to read text in
an image. In order to do so, the sequences are compared, thus the determined
network prediction DLLayerInput
with sequence length
DLLayerInputLengths
to the given DLLayerTarget
with sequence
length DLLayerTargetLengths
.
The following variables are important to understand the input shapes:
T: Maximum input sequence length
(i.e., width
of DLLayerInput
)
S: Maximum output sequence length
(i.e., width
of DLLayerTarget
)
C: Number of classes including 0 as the blank class ID
(i.e., depth
of DLLayerInput
)
This layer expects multiple layers as input:
DLLayerInput
: Specifies the network prediction.
Shape: [T,1,C]
DLLayerInputLengths
: Specifyies the input
sequence length of each item in the batch.
Shape: [1,1,1]
DLLayerTarget
: Specifies the target
sequences.
Shape: [S,1,1]
DLLayerTargetLengths
: Input layer which specifies the target
sequence length of each item in the batch.
Shape: [1,1,1]
The parameter LayerName
sets an individual layer name. Note that if
creating a model using create_dl_model
each layer of the created
network must have a unique name.
The CTC loss is typically applied in a CNN as follows. The input sequence is
expected to be encoded in some CNN layer with the output shape
[width
: T, height
: 1, depth
: C].
Typically the end of a large fully convolutional classifier is pooled in
height
down to 1 with an average pooling layer.
It is important that the last layer is wide enough to hold enough information.
In order to obtain the sequence prediction in the output depth
a 1x1
convolutional layer is added after the pooling with the number of
kernels set to C.
In this use case the CTC loss obtains this convolutional layer as input layer
DLLayerInput
. The width
of the input layer determines the
maximum output sequence of the model.
The CTC loss can be applied to a batch of input items with differing input and
target sequence lengths. T and S are the maximum lengths.
In DLLayerInputLengths
and DLLayerTargetLengths
the
individual length of each item in a batch needs to be specified.
A model containing this layer cannot be trained on a CPU.
A model containing this layer cannot be trained with a 'batch_size_multiplier' != 1.0.
The input layer DLLayerInput
must not be a softmax layer.
The softmax calculation is done internally in this layer. For
inference, there should be an extra softmax layer connected to the
DLLayerInput
(see create_dl_layer_softmax
).
The following generic parameters GenParamName
and the corresponding
values GenParamValue
are supported:
Determines whether apply_dl_model
will include the output of this
layer in the dictionary DLResultBatch
even without specifying this
layer in Outputs
('true' ) or not ('false' ).
Default: 'false'
Certain parameters of layers created using this operator
create_dl_layer_loss_ctc
can be set and retrieved using
further operators.
The following tables give an overview, which parameters can be set
using set_dl_model_layer_param
and which ones can be retrieved
using get_dl_model_layer_param
or get_dl_layer_param
.
Note, the operators set_dl_model_layer_param
and
get_dl_model_layer_param
require a model created by
create_dl_model
.
Layer Parameters | set | get |
---|---|---|
'input_layer' (DLLayerInput , DLLayerInputLengths , DLLayerTarget , and/or DLLayerTargetLengths ) |
||
'name' (LayerName ) |
||
'output_layer' (DLLayerLossCTC ) |
||
'shape' | ||
'type' |
Generic Layer Parameters | set | get |
---|---|---|
'is_inference_output' | ||
'num_trainable_params' |
DLLayerInput
(input_control) dl_layer →
(handle)
Input layer with network predictions.
DLLayerInputLengths
(input_control) dl_layer →
(handle)
Input layer which specifies the input sequence length of each item in the batch.
DLLayerTarget
(input_control) dl_layer →
(handle)
Input layer which specifies the target sequences. If
the input dimensions of the CNN are changed the width
of this layer is automatically resized to the same
width as the DLLayerInput
layer.
DLLayerTargetLengths
(input_control) dl_layer →
(handle)
Input layer which specifies the target sequence length of each item in the batch.
LayerName
(input_control) string →
(string)
Name of the output layer.
GenParamName
(input_control) attribute.name(-array) →
(string)
Generic input parameter names.
Default value: []
List of values: 'is_inference_output'
GenParamValue
(input_control) attribute.value(-array) →
(string / integer / real)
Generic input parameter values.
Default value: []
Suggested values: 'true' , 'false'
DLLayerLossCTC
(output_control) dl_layer →
(handle)
CTC loss layer.
* Create a simple Seq2Seq model which overfits to a single output sequence. * Input sequence length T := 6 * Number of classes including blank (blank is always class_id: 0) C := 3 * Batch Size N := 1 * Maximum length of target sequences S := 3 * Model creation create_dl_layer_input ('input', [T,1,1], [], [], Input) create_dl_layer_dense (Input, 'dense', T*C, [], [], DLLayerDense) create_dl_layer_reshape (DLLayerDense, 'dense_reshape', [T,1,C], [], [],\ ConvFinal) * Training part * Specify the shapes without batch-size * (batch-size will be specified in the model). create_dl_layer_input ('ctc_input_lengths', [1,1,1], [], [],\ DLLayerInputLengths) create_dl_layer_input ('ctc_target', [S,1,1], [], [], DLLayerTarget) create_dl_layer_input ('ctc_target_lengths', [1,1,1], [], [],\ DLLayerTargetLengths) * Create the loss layer create_dl_layer_loss_ctc (ConvFinal, DLLayerInputLengths, DLLayerTarget,\ DLLayerTargetLengths, 'ctc_loss', [], [],\ DLLayerLossCTC) * Get all names so that users can set values get_dl_layer_param (ConvFinal, 'name', CTCInputName) get_dl_layer_param (DLLayerInputLengths, 'name', CTCInputLengthsName) get_dl_layer_param (DLLayerTarget, 'name', CTCTargetName) get_dl_layer_param (DLLayerTargetLengths, 'name', CTCTargetLengthsName) * Inference part create_dl_layer_softmax (ConvFinal, 'softmax', [], [], DLLayerSoftMax) create_dl_layer_depth_max (DLLayerSoftMax, 'prediction', 'argmax', [], [],\ DLLayerDepthMaxArg, _) * Setting a seed because the weights of the network are randomly initialized set_system ('seed_rand', 35) create_dl_model ([DLLayerLossCTC,DLLayerDepthMaxArg], DLModel) set_dl_model_param (DLModel, 'batch_size', N) set_dl_model_param (DLModel, 'runtime', 'gpu') set_dl_model_param (DLModel, 'learning_rate', 1) * Create input sample for training InputSequence := [0,1,2,3,4,5] TargetSequence := [1,2,1] create_dict (InputSample) set_dict_tuple (InputSample, 'input', InputSequence) set_dict_tuple (InputSample, 'ctc_input_lengths', |InputSequence|) set_dict_tuple (InputSample, 'ctc_target', TargetSequence) set_dict_tuple (InputSample, 'ctc_target_lengths', |TargetSequence|) Eps := 0.01 PredictedSequence := [] dev_inspect_ctrl ([InputSequence, TargetSequence, CTCLoss, PredictedValues,\ PredictedSequence]) MaxIterations:= 15 for I := 0 to MaxIterations by 1 apply_dl_model (DLModel, InputSample, ['prediction','softmax'], \ DLResultBatch) get_dict_object (Softmax, DLResultBatch, 'softmax') get_dict_object (Prediction, DLResultBatch, 'prediction') PredictedValues := [] for t := 0 to T-1 by 1 get_grayval (Prediction, 0, t, PredictionValue) PredictedValues := [PredictedValues, PredictionValue] endfor train_dl_model_batch (DLModel, InputSample, DLTrainResult) get_dict_tuple (DLTrainResult, 'ctc_loss', CTCLoss) if (CTCLoss < Eps) break endif stop() endfor * Rudimentary implementation of fastest path prediction PredictedSequence := [] LastV := -1 for I := 0 to |PredictedValues|-1 by 1 V := PredictedValues[I] if (V == 0) LastV := -1 continue endif if (|PredictedSequence| > 0 and V == LastV) continue endif PredictedSequence := [PredictedSequence, V] LastV := PredictedSequence[|PredictedSequence|-1] endfor
Graves Alex et al., "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. 2006.
Deep Learning Training