create_dl_layer_loss_ctcT_create_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc (Operator)

Name

create_dl_layer_loss_ctcT_create_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc — Create a CTC loss layer.

Signature

create_dl_layer_loss_ctc( : : DLLayerInput, DLLayerInputLengths, DLLayerTarget, DLLayerTargetLengths, LayerName, GenParamName, GenParamValue : DLLayerLossCTC)

Description

The operator create_dl_layer_loss_ctccreate_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc creates a Connectionist Temporal Classification (CTC) loss layer whose handle is returned in DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc. See the reference cited below for information about the CTC loss.

With this loss layer it is possible to train sequence to sequence models (Seq2Seq). E.g., it can be used to train a model that is able to read text in an image. In order to do so, the sequences are compared, thus the determined network prediction DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input with sequence length DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths to the given DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target with sequence length DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths.

The following variables are important to understand the input shapes:

T: Maximum input sequence length (i.e., width of DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input)
S: Maximum output sequence length (i.e., width of DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target)
C: Number of classes including 0 as the blank class ID (i.e., depth of DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input)

This layer expects multiple layers as input:

DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input: Specifies the network prediction.

Shape: [T,1,C]
DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths: Specifyies the input sequence length of each item in the batch.

Shape: [1,1,1]
DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target: Specifies the target sequences.

Shape: [S,1,1]
DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths: Input layer which specifies the target sequence length of each item in the batch.

Shape: [1,1,1]

The parameter LayerNameLayerNameLayerNameLayerNamelayerNamelayer_name sets an individual layer name. Note that if creating a model using create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelCreateDlModelcreate_dl_model each layer of the created network must have a unique name.

The CTC loss is typically applied in a CNN as follows. The input sequence is expected to be encoded in some CNN layer with the output shape [width: T, height: 1, depth: C]. Typically the end of a large fully convolutional classifier is pooled in height down to 1 with an average pooling layer. It is important that the last layer is wide enough to hold enough information. In order to obtain the sequence prediction in the output depth a 1x1 convolutional layer is added after the pooling with the number of kernels set to C. In this use case the CTC loss obtains this convolutional layer as input layer DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input. The width of the input layer determines the maximum output sequence of the model.

The CTC loss can be applied to a batch of input items with differing input and target sequence lengths. T and S are the maximum lengths. In DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths and DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths the individual length of each item in a batch needs to be specified.

Restrictions

A model containing this layer cannot be trained on a CPU.
A model containing this layer cannot be trained with a 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier" != 1.0.
The input layer DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input must not be a softmax layer. The softmax calculation is done internally in this layer. For inference, there should be an extra softmax layer connected to the DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input (see create_dl_layer_softmaxcreate_dl_layer_softmaxCreateDlLayerSoftmaxCreateDlLayerSoftmaxCreateDlLayerSoftmaxcreate_dl_layer_softmax).

The following generic parameters GenParamNameGenParamNameGenParamNameGenParamNamegenParamNamegen_param_name and the corresponding values GenParamValueGenParamValueGenParamValueGenParamValuegenParamValuegen_param_value are supported:

'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output""is_inference_output":

Determines whether apply_dl_modelapply_dl_modelApplyDlModelApplyDlModelApplyDlModelapply_dl_model will include the output of this layer in the dictionary DLResultBatchDLResultBatchDLResultBatchDLResultBatchDLResultBatchdlresult_batch even without specifying this layer in OutputsOutputsOutputsOutputsoutputsoutputs ('true'"true""true""true""true""true") or not ('false'"false""false""false""false""false").

Default: 'false'"false""false""false""false""false"

Certain parameters of layers created using this operator create_dl_layer_loss_ctccreate_dl_layer_loss_ctcCreateDlLayerLossCtcCreateDlLayerLossCtcCreateDlLayerLossCtccreate_dl_layer_loss_ctc can be set and retrieved using further operators. The following tables give an overview, which parameters can be set using set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param and which ones can be retrieved using get_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_param or get_dl_layer_paramget_dl_layer_paramGetDlLayerParamGetDlLayerParamGetDlLayerParamget_dl_layer_param. Note, the operators set_dl_model_layer_paramset_dl_model_layer_paramSetDlModelLayerParamSetDlModelLayerParamSetDlModelLayerParamset_dl_model_layer_param and get_dl_model_layer_paramget_dl_model_layer_paramGetDlModelLayerParamGetDlModelLayerParamGetDlModelLayerParamget_dl_model_layer_param require a model created by create_dl_modelcreate_dl_modelCreateDlModelCreateDlModelCreateDlModelcreate_dl_model.

Layer Parameters	set	get
'input_layer'"input_layer""input_layer""input_layer""input_layer""input_layer" (`DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input`, `DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths`, `DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target`, and/or `DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths`)
'name'"name""name""name""name""name" (`LayerNameLayerNameLayerNameLayerNamelayerNamelayer_name`)
'output_layer'"output_layer""output_layer""output_layer""output_layer""output_layer" (`DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc`)
'shape'"shape""shape""shape""shape""shape"
'type'"type""type""type""type""type"

Generic Layer Parameters	set	get
'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output""is_inference_output"
'num_trainable_params'"num_trainable_params""num_trainable_params""num_trainable_params""num_trainable_params""num_trainable_params"

Execution Information

Multithreading type: reentrant (runs in parallel with non-exclusive operators).
Multithreading scope: global (may be called from any thread).
Processed without parallelization.

Parameters

DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input (input_control) dl_layer → (handle)

Input layer with network predictions.

DLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsDLLayerInputLengthsdllayer_input_lengths (input_control) dl_layer → (handle)

Input layer which specifies the input sequence length of each item in the batch.

DLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetDLLayerTargetdllayer_target (input_control) dl_layer → (handle)

Input layer which specifies the target sequences. If the input dimensions of the CNN are changed the width of this layer is automatically resized to the same width as the DLLayerInputDLLayerInputDLLayerInputDLLayerInputDLLayerInputdllayer_input layer.

DLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsDLLayerTargetLengthsdllayer_target_lengths (input_control) dl_layer → (handle)

Input layer which specifies the target sequence length of each item in the batch.

LayerNameLayerNameLayerNameLayerNamelayerNamelayer_name (input_control) string → (string)

Name of the output layer.

GenParamNameGenParamNameGenParamNameGenParamNamegenParamNamegen_param_name (input_control) attribute.name(-array) → (string)

Generic input parameter names.

Default value: []

List of values: 'is_inference_output'"is_inference_output""is_inference_output""is_inference_output""is_inference_output""is_inference_output"

GenParamValueGenParamValueGenParamValueGenParamValuegenParamValuegen_param_value (input_control) attribute.value(-array) → (string / integer / real)

Generic input parameter values.

Default value: []

Suggested values: 'true'"true""true""true""true""true", 'false'"false""false""false""false""false"

DLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCDLLayerLossCTCdllayer_loss_ctc (output_control) dl_layer → (handle)

CTC loss layer.

Example (HDevelop)

* Create a simple Seq2Seq model which overfits to a single output sequence.

* Input sequence length
T := 6
* Number of classes including blank (blank is always class_id: 0)
C := 3
* Batch Size
N := 1
* Maximum length of target sequences
S := 3

* Model creation
create_dl_layer_input ('input', [T,1,1], [], [], Input)
create_dl_layer_dense (Input, 'dense', T*C, [], [], DLLayerDense)
create_dl_layer_reshape (DLLayerDense, 'dense_reshape', [T,1,C], [], [],\
                         ConvFinal)

* Training part

* Specify the shapes without batch-size
* (batch-size will be specified in the model).
create_dl_layer_input ('ctc_input_lengths', [1,1,1], [], [],\
                       DLLayerInputLengths)
create_dl_layer_input ('ctc_target', [S,1,1], [], [], DLLayerTarget)
create_dl_layer_input ('ctc_target_lengths', [1,1,1], [], [],\
                       DLLayerTargetLengths)
* Create the loss layer
create_dl_layer_loss_ctc (ConvFinal, DLLayerInputLengths, DLLayerTarget,\
                          DLLayerTargetLengths, 'ctc_loss', [], [],\
                          DLLayerLossCTC)

* Get all names so that users can set values
get_dl_layer_param (ConvFinal, 'name', CTCInputName)
get_dl_layer_param (DLLayerInputLengths, 'name', CTCInputLengthsName)
get_dl_layer_param (DLLayerTarget, 'name', CTCTargetName)
get_dl_layer_param (DLLayerTargetLengths, 'name', CTCTargetLengthsName)

* Inference part
create_dl_layer_softmax (ConvFinal, 'softmax', [], [], DLLayerSoftMax)
create_dl_layer_depth_max (DLLayerSoftMax, 'prediction', 'argmax', [], [],\
                           DLLayerDepthMaxArg, _)

* Setting a seed because the weights of the network are randomly initialized
set_system ('seed_rand', 35)

create_dl_model ([DLLayerLossCTC,DLLayerDepthMaxArg], DLModel)

set_dl_model_param (DLModel, 'batch_size', N)
set_dl_model_param (DLModel, 'runtime', 'gpu')
set_dl_model_param (DLModel, 'learning_rate', 1)

* Create input sample for training
InputSequence := [0,1,2,3,4,5]
TargetSequence := [1,2,1]
create_dict (InputSample)
set_dict_tuple (InputSample, 'input', InputSequence)
set_dict_tuple (InputSample, 'ctc_input_lengths', |InputSequence|)
set_dict_tuple (InputSample, 'ctc_target', TargetSequence)
set_dict_tuple (InputSample, 'ctc_target_lengths', |TargetSequence|)
Eps := 0.01

PredictedSequence := []
dev_inspect_ctrl ([InputSequence, TargetSequence, CTCLoss, PredictedValues,\
                  PredictedSequence])
MaxIterations:= 15
for I := 0 to MaxIterations by 1
  apply_dl_model (DLModel, InputSample, ['prediction','softmax'], \
                  DLResultBatch)
  get_dict_object (Softmax, DLResultBatch, 'softmax')
  get_dict_object (Prediction, DLResultBatch, 'prediction')
  PredictedValues := []
  for t := 0 to T-1 by 1
      get_grayval (Prediction, 0, t, PredictionValue)
      PredictedValues := [PredictedValues, PredictionValue]
  endfor
  train_dl_model_batch (DLModel, InputSample, DLTrainResult)

  get_dict_tuple (DLTrainResult, 'ctc_loss', CTCLoss)
  if (CTCLoss < Eps)
      break
  endif
  stop()
endfor

* Rudimentary implementation of fastest path prediction
PredictedSequence := []
LastV := -1
for I := 0 to |PredictedValues|-1 by 1
  V := PredictedValues[I]
  if (V == 0)
      LastV := -1
      continue
  endif
  if (|PredictedSequence| > 0 and V == LastV)
      continue
  endif
  PredictedSequence := [PredictedSequence, V]
  LastV :=  PredictedSequence[|PredictedSequence|-1]
endfor

References

Graves Alex et al., "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks." Proceedings of the 23rd international conference on Machine learning. 2006.

Module

Deep Learning Training

Operators