train_dl_model_batchT_train_dl_model_batchTrainDlModelBatchTrainDlModelBatch (Operator)


train_dl_model_batchT_train_dl_model_batchTrainDlModelBatchTrainDlModelBatch — Train a deep learning model.


train_dl_model_batch( : : DLModelHandle, DLSampleBatch : DLTrainResult)

Herror T_train_dl_model_batch(const Htuple DLModelHandle, const Htuple DLSampleBatch, Htuple* DLTrainResult)

void TrainDlModelBatch(const HTuple& DLModelHandle, const HTuple& DLSampleBatch, HTuple* DLTrainResult)

HDict HDlModel::TrainDlModelBatch(const HDict& DLSampleBatch) const

static void HOperatorSet.TrainDlModelBatch(HTuple DLModelHandle, HTuple DLSampleBatch, out HTuple DLTrainResult)

HDict HDlModel.TrainDlModelBatch(HDict DLSampleBatch)


The operator train_dl_model_batchtrain_dl_model_batchTrainDlModelBatchTrainDlModelBatchTrainDlModelBatch performs a training step of the deep learning model contained in DLModelHandleDLModelHandleDLModelHandleDLModelHandleDLModelHandle. The current loss values are returned in the dictionary DLTrainResultDLTrainResultDLTrainResultDLTrainResultDLTrainResult.

A training step means here to perform a single update of the weights, based on the batch images given in DLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatch. The optimization algorithm used is explained further in the subsection “Further Information on the Algorithms” below. For more information on how to train a network, please see the subchapter “The Network and the Training Process” in Deep Learning.

To successfully train the model, its applicable hyperparameters need to be set and the training data handed over according to the model requirements. For information to the hyperparameters, see the chapter of the corresponding model and the general chapter Deep Learning.

The training data consists of images and corresponding information. This operator expects one batch of training data, handed over in the tuple of dictionaries DLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatch. Such a DLSampleDLSampleDLSampleDLSampleDLSample dictionary is created out of DLDatasetDLDatasetDLDatasetDLDatasetDLDataset for every image sample, e.g., by the procedure gen_dl_samples. See the chapter Deep Learning / Model for further information to the used dictionaries and their keys.

The number of images in a DLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatch tuple needs to be a multiple of the 'batch_size'"batch_size""batch_size""batch_size""batch_size" where the parameter 'batch_size'"batch_size""batch_size""batch_size""batch_size" is limited by the amount of available GPU memory. In order to process more images in one training step, the model parameter 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier" can be set to a value greater than 1. The number of DLSampleDLSampleDLSampleDLSampleDLSample dictionaries being passed to the training operator needs to be equal to 'batch_size'"batch_size""batch_size""batch_size""batch_size" times 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier". Note that a training step calculated for a batch and a 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier" greater 1 is an approximation of a training step calculated for the same batch but with a 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier" equal to 1 and an accordingly greater 'batch_size'"batch_size""batch_size""batch_size""batch_size". As an example, the loss calculated with a 'batch_size'"batch_size""batch_size""batch_size""batch_size" of 4 and a 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier" of 2 is usually not equal to the loss calculated with a 'batch_size'"batch_size""batch_size""batch_size""batch_size" of 8 and a 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier" of 1, although the same number of DLSampleDLSampleDLSampleDLSampleDLSample dictionaries is used for training in both cases. However, the approximation generally delivers comparably good results, so it can be utilized if you wish to train with a larger number of images than your GPU allows. In some rare cases the approximation with a 'batch_size'"batch_size""batch_size""batch_size""batch_size" of 1 and an accordingly large 'batch_size_multiplier'"batch_size_multiplier""batch_size_multiplier""batch_size_multiplier""batch_size_multiplier" does not show the expected performance. Set the 'batch_size'"batch_size""batch_size""batch_size""batch_size" to a value greater than 1 can help to solve this issue.

In the output dictionary DLTrainResultDLTrainResultDLTrainResultDLTrainResultDLTrainResult you get the current value of the total loss as the value for the key total_losstotal_losstotal_losstotal_losstotalLoss as well as the values for all other losses included in your model.

Further Information on the Algorithms

During training, a nonlinear optimization algorithm is applied with the goal to minimize the value of the total loss function. The latter one is determined based on the prediction of the neural network for the current batch of images. The algorithm used for optimization is stochastic gradient descent (SGD). It updates the layers' weights of the previous iteration , , to the new values at iteration as follows:

Here, is the learning rate, the momentum, the total loss, and the gradient of the total loss with respect to the weights. The variable is used to include the influence of the momentum .

The different models may have several losses implemented, which are summed up. To this sum the regularization term is added, which generally penalizes large weights, and together they form the total loss.

The different types of losses are:

Huber Loss (model of 'type'"type""type""type""type"='detection'"detection""detection""detection""detection"):

The 'Huber Loss' is also known as 'Smooth L1 Loss'. The total 'Huber Loss' is the sum of the contributions from all bounding box variables of all found instances in the batch. For a single bounding box variable this contribution defined as follows: Thereby, denotes a bounding box variable and a parameter fixed to a value of 0.11.

Focal Loss (model of 'type'"type""type""type""type"='detection'"detection""detection""detection""detection"):

The total 'Focal Loss' is the sum of the contributions from all found instance in the batch. For a single sample, this contribution is defined as follows: where is a parameter fixed to a value of 2. stands for the 'class_weight' of the -th class and , are defined as Here, is a tuple of the model's estimated probabilities for each of the -many classes, and is a one-hot encoded target vector that encodes the class of the annotation.

Multinomial Logistic Loss (model of 'type'"type""type""type""type"='classification'"classification""classification""classification""classification", 'segmentation'"segmentation""segmentation""segmentation""segmentation"):

The 'Multinomial Logistic Loss' is also known as 'Cross Entropy Loss'. It is defined as follows:

Here, is the predicted result which depends on the network weights and the input batch . is a one-hot encoded target vector that encodes the label of the -th image of the batch containing -many images, and shall be understood to be a vector such that is applied on each component of .

The regularization term is a weighted -norm involving all weights except for biases. Its influence can be controlled through . Latter one is the hyperparameter 'weight_prior'"weight_prior""weight_prior""weight_prior""weight_prior", which can be set with set_dl_model_paramset_dl_model_paramSetDlModelParamSetDlModelParamSetDlModelParam. Here the index runs over all weights of the network, except for the biases which are not regularized. The regularization term generally penalizes large weights, thus pushing the weights towards zero, which effectively reduces the complexity of the model.


To run this operator, cuDNN and cuBLAS are required when 'runtime'"runtime""runtime""runtime""runtime" is set to 'gpu'"gpu""gpu""gpu""gpu", see get_dl_model_paramget_dl_model_paramGetDlModelParamGetDlModelParamGetDlModelParam. For further details, please refer to the “Installation Guide”, paragraph “Requirements for Deep Learning”.

Execution Information


DLModelHandleDLModelHandleDLModelHandleDLModelHandleDLModelHandle (input_control)  dl_model HDlModel, HTupleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

Deep learning model handle.

DLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatchDLSampleBatch (input_control)  dict HDict, HTupleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

Tuple of Dictionaries with input images and corresponding information.

DLTrainResultDLTrainResultDLTrainResultDLTrainResultDLTrainResult (output_control)  dict HDict, HTupleHTupleHtuple (handle) (IntPtr) (HHandle) (handle)

Dictionary with the train result data.


If the parameters are valid, the operator train_dl_model_batchtrain_dl_model_batchTrainDlModelBatchTrainDlModelBatchTrainDlModelBatch returns the value 2 (H_MSG_TRUE). If necessary, an exception is raised.

Possible Predecessors

read_dl_modelread_dl_modelReadDlModelReadDlModelReadDlModel, set_dl_model_paramset_dl_model_paramSetDlModelParamSetDlModelParamSetDlModelParam, get_dl_model_paramget_dl_model_paramGetDlModelParamGetDlModelParamGetDlModelParam

Possible Successors


See also



Deep Learning Training