Operators |

`set_regularization_params_class_mlp` — Set the regularization parameters of a multilayer perceptron.

**set_regularization_params_class_mlp**( : : *MLPHandle*, *GenParamName*, *GenParamValue* : )

`set_regularization_params_class_mlp` sets the regularization
parameters of the multilayer perceptron (MLP) passed in
* MLPHandle*. The regularization parameter to be set is
specified with

* GenParamName* can assume the following values:

*'num_outer_iterations'*:-
This parameter determines whether the regularization parameters should be determined automatically (

>= 1) or manually (`GenParamValue`= 0, default), as described below in the sections “Technical Background” and “Automatic Determination of the Regularization Parameters”. As described in detail in the section “Automatic Determination of the Regularization Parameters”,`GenParamValue`*'num_outer_iterations'*should not be set too large (in the range of 1 to 5) to enable manual checking of the convergence of the automatic determination of the regularization parameters. *'num_inner_iterations'*:-
This parameter potentially enables somewhat faster convergence of the automatic determination of the regularization parameters, as described below in the section “Automatic Determination of the Regularization Parameters”. It should typically be left at its default value of 1.

*'weight_prior'*:-
On the one hand, this selects the regularization model to be used, as described below in the section “Technical Background”. On the other hand, if manual determination of the regularization parameters has been selected (i.e.,

*'num_outer_iterations'*= 0), the regularization parameters are set with, whereas the initial values of the regularization parameters are set if automatic determination of the regularization parameters has been selected (i.e.,`GenParamName`*'num_outer_iterations'*>= 1), as described below in the section “Automatic Determination of the Regularization Parameters”. Manual determination of the regularzation parameters (see the section “Regularization Parameters” below) is only realistic if a single regularization parameter is used. In all other cases, the regularization parameters should be determined automatically. *'noise_prior'*:-
This allows to specify a noise prior for MLPs that have been configured for regression, as described below in the section “Application Areas”. If manual determination of the regularization parameters has been selected, the noise prior is set with

, whereas the initial value of the noise prior is set if automatic determination of the regularization parameters has been selected. Typically, it is only useful to use this parameter if the regularization parameters are determined automatically.`GenParamName`

Please note that the automatic determination of the regularization
parameters requires a very large amount of memory and runtime, as
described in detail in the section “Complexity” below. Therefore,
`NumHidden` should not be selected too large when the MLP is
created with `create_class_mlp`. For example, normal OCR
applications seldom require `NumHidden` to be larger than
30-60.

**Application Areas**

As described at `create_class_mlp`, it may be desirable to
regularize the MLP to enforce a smoother transition of the
confidences between the different classes and to prevent overfitting
of the MLP to the training data. To achieve this, a penalty for
large MLP weights (which are the main reason for very sharp
transitions between classes) can be added to the training of the MLP
in `train_class_mlp` by setting * GenParamName* to

If the MLP has been configured for regression (i.e., if
`OutputFunction` was set to *'linear'* in
`create_class_mlp`), an inverse variance of the expected noise
in the data can be specified by setting * GenParamName* to

As described in more detail below, the regularization parameters of
the MLP may be determined automatically (at the expense of
significantly increased training times) by setting
* GenParamName* to

**Technical Background**

There are three different kinds of penalty terms that can be set
with *'weight_prior'*. Note that in the following the
parameters and
refer to the weights of the different
layers of the MLP, as described in `create_class_mlp`.

If a single value is specified, all MLP weights are
penalized equally by adding the following term to the optimization
in `train_class_mlp`:

Alternatively, four values can be specified. These four parameters enable the individual regularization of the four groups of weights:

Finally, values can be specified. These parameters enable the individual regularization of each input variable and the regularization of the remaining three groups of weights:

The parameters can be regarded as the inverse variance of a Gaussian prior distribution on the MLP weights, i.e., they express an expectation about the size of the MLP weights. The larger the are chosen, the smaller the MLP weights will be.

**Regularization Parameters**

The larger the regularization parameter(s) *'weight_prior'*
are chosen, the smoother the transition of the confidences between
the different classes will be. The required values for the
regularization parameter(s) depend on the MLP, especially the number
of hidden units, the training data, and the scale of the training
data (if no normalization is used). Typically, a higher value for
the regularization parameter(s) is necessary if the MLP has more
hidden units and if the training data consists of more points. For
typical applications, the regularization parameters are determined
by verifying the MLP performance on a test data set that is
independent from the training data set. If an independent test data
set is unavailable, cross validation can be used. Cross validation
works by splitting the data set into separate parts (for example,
80% of the data set for training and 20% for testing), training
the MLP with the training data set (the 80% of the data in the
above example), and testing the MLP performance on the test set (the
20% of the data in the above example). The procedure can be
repeated for the other possible splits of the data (in the
80%-20% example, there are five possible splits). This
procedure can, for example, start with relatively large values of
the weight regularization parameters (which will typically result in
misclassifications on the test data set). The weight regularization
parameters can then be decreased until an acceptable performance on
the test data sets is reached.

**Automatic Determination of the Regularization Parameters**

The regularization parameters, i.e., the weight priors and the noise
prior, can also be determined automatically by
`train_class_mlp` using the so-called evidence procedure (for
details about the evidence procedure, please refer to the articles
in the section “References” below). This training mode can be
selected by setting * GenParamName* to

The evidence procedure is an iterative algorithm that performs the
following two steps for a number of outer iterations: first, the
network is trained using the current values of the regularization
parameters; next, the regularization parameters are re-estimated
using the weights of the optimized MLP. In the first iteration, the
weight priors and noise priors specified with
*'weight_prior'* and *'noise_prior'* are used. Thus,
for the automatic determination of the regularization parameters,
the values specified by the user serve as the starting parameters
for the evidence procedure. The starting parameters for the weight
priors should not be set too large because this might
over-regularize the training and may result in badly determined
regularization parameters. The initial values for the weight priors
should typically be in the range 0.01-0.1.

The number of outer iterations can be set by setting
* GenParamName* to

The number of outer iterations should be set high enough to ensure
the convergence of the regularization parameters. In contrast to
the training of the MLP's weights, a numerical convergence criterion
is typically very difficult to specify and some human judgement is
typically required to decide whether the regularization parameters
have converged sufficiently. Therefore, it might not be possible to
set the number of outer iterations a priori to ensure convergence of
the regularization parameters. In these cases, the outer loop over
the steps of the evidence procedure can be implemented manually by
setting *'num_outer_iterations'* to 1 and calling
`train_class_mlp` repeatedly. This has the advantage that the
weight priors and noise prior can be queried after each iteration
and can be checked manually for convergence. In this approach, the
performance of the MLP can even be checked after each iteration on
an independent test set to check the generalization performance of
the classifier.

If the number of outer iterations has been determined
(approximately) for a class of applications, it may be possible to
reduce the run time of the training (if MLPs should be trained in
the future with similar data sets) by setting * GenParamName*
to

The automatically determined weight priors and noise prior can be
queried after the training using
`get_regularization_params_class_mlp` by setting
`GenParamName` to *'weight_prior'* or
*'noise_prior'*, respectively.

In addition to the weight prior and noise prior, the evidence
procedure determines an estimate of the number of parameters of the
MLP that can be determined well using the training data. This
result can be queried using
`get_regularization_params_class_mlp` by setting
`GenParamName` to *'num_well_determined_params'*.
Alternatively, the fraction of well-determined parameters can be
queried by setting `GenParamName` to
*'fraction_well_determined_params'*. If the number of
well-determined parameters is significantly smaller than
(where is the number of weights in
the MLP, as described in the section “Complexity” below) or the
fraction of well-determined parameters is significantly smaller than
1, consider reducing the number of hidden units or, if the number of
hidden units cannot be decreased without increasing the error rate
of the MLP significantly, consider performing a preprocessing that
reduces the number of input variables to the net, i.e., canonical
variates or principal components.

Please note that the number of well-determined parameters can only
be determined after the weight priors and noise prior have been
determined. This is the reason why the evidence procedure ends with
the determination of the regularization parameters and not with the
training of the MLP weights. Hence, after the evidence procedure
the MLP will not have been trained with the latest regularization
parameters. This should make no difference if they have converged.
If you want the training to end with an optimization of the weights
using the latest values of the regularization parameters, you can
set *'num_outer_iterations'* to 0 and can call
`train_class_mlp` again. If you do so, please note, however,
that the number of well-determined parameters may change and,
therefore, the value returned by
`get_regularization_params_class_mlp` is technically
inconsistent.

**Saved Parameters**

Note that the parameters *'num_outer_iterations'* and
*'num_inner_iterations'* only affect the training of the MLP.
Therefore, they are not saved when the MLP is stored using
`write_class_mlp` or `serialize_class_mlp`. Thus, they
must be set anew if the MLP is loaded again using
`read_class_mlp` or `deserialize_class_mlp` and if
training using the automatic determination of the regularization
parameters should be continued. All other parameters described
above (*'weight_prior'*, *'noise_prior'*,
*'num_well_determined_params'*, and
*'fraction_well_determined_params'*) are saved.

- Multithreading type: reentrant (runs in parallel with non-exclusive operators).
- Multithreading scope: global (may be called from any thread).
- Processed without parallelization.

This operator modifies the state of the following input parameter:

The value of this parameter may not be shared across multiple threads without external synchronization.MLP handle.

Name of the regularization parameter to set.

Default value: 'weight_prior'

List of values: 'noise_prior' , 'num_inner_iterations' , 'num_outer_iterations' , 'weight_prior'

Value of the regularization parameter.

Default value: 1.0

Suggested values: 0.01, 0.1, 1.0, 10.0, 100.0, 0, 1, 2, 3, 5, 10, 15, 20

* This example shows how to determine the regularization parameters * automatically without examining the convergence of the * regularization parameters. * Create the MLP. create_class_mlp (NumIn, NumHidden, NumOut, 'softmax', \ 'normalization', NumIn, 42, MLPHandle) * Generate and add the training data for J := 0 to NumData-1 by 1 * Generate training features and classes. * Data = [...] * Class = [...] add_sample_class_mlp (MLPHandle, Data, Class) endfor * Set up the automatic determination of the regularization * parameters. set_regularization_params_class_mlp (MLPHandle, 'weight_prior', \ [0.01,0.01,0.01,0.01]) set_regularization_params_class_mlp (MLPHandle, \ 'num_outer_iterations', 10) * Train the MLP. train_class_mlp (MLPHandle, 100, 1, 0.01, Error, ErrorLog) * Read out the estimate of the number of well-determined * parameters. get_regularization_params_class_mlp (MLPHandle, \ 'fraction_well_determined_params', \ FractionParams) * If FractionParams differs substantially from 1, consider reducing * NumHidden appropriately and consider performing a preprocessing that * reduces the number of input variables to the net, i.e., canonical * variates or principal components. write_class_mlp (MLPHandle, 'classifier.mlp') clear_class_mlp (MLPHandle) * This example shows how to determine the regularization parameters * automatically while examining the convergence of the * regularization parameters. * Create the MLP. create_class_mlp (NumIn, NumHidden, NumOut, 'softmax', \ 'normalization', NumIn, 42, MLPHandle) * Generate and add the training data. for J := 0 to NumData-1 by 1 * Generate training features and classes * Data = [...] * Class = [...] add_sample_class_mlp (MLPHandle, Data, Class) endfor * Set up the automatic determination of the regularization * parameters. set_regularization_params_class_mlp (MLPHandle, 'weight_prior', \ [0.01,0.01,0.01,0.01]) set_regularization_params_class_mlp (MLPHandle, \ 'num_outer_iterations', 1) for OuterIt := 1 to 10 by 1 * Train the MLP train_class_mlp (MLPHandle, 100, 1, 0.01, Error, ErrorLog) * Read out the regularization parameters get_regularization_params_class_mlp (MLPHandle, 'weight_prior', \ WeightPrior) * Inspect the regularization parameters manually for * convergence and exit the loop manually if they have * converged. * [...] endfor * Read out the estimate of the number of well-determined * parameters. get_regularization_params_class_mlp (MLPHandle,\ 'fraction_well_determined_params',\ FractionParams) * If FractionParams differs substantially from 1, consider reducing * NumHidden appropriately and consider performing a preprocessing that * reduces the number of input variables to the net, i.e., canonical * variates or principal components. write_class_mlp (MLPHandle, 'classifier.mlp') clear_class_mlp (MLPHandle)

Let denote the number of input units of the MLP
(i.e.,
or , depending on the value of
`Preprocessing`, as described at `create_class_mlp`),
the number of hidden units, and
the number of output units. Then, the number of weights of the MLP
is . Let denote the number of training
samples. Let denote the number of iterations set
with `MaxIterations` in `train_class_mlp`. Let
and denote the number of outer and
inner iterations, respectively.

The run time of the training without regularization or with regularization with fixed regularization parameters is of complexity . In contrast, the runtime of the training with automatic determination of the regularization parameters is of complexity .

The training without regularization or with regularization with fixed regularization parameters requires at least bytes of memory. The training with automatic determination of the regularization parameters requires at least bytes of memory. Under special circumstances, another bytes of memory are required.

If the parameters are valid, the operator
`set_regularization_params_class_mlp` returns the value 2 (H_MSG_TRUE).
If necessary, an exception is raised.

`get_regularization_params_class_mlp`,
`train_class_mlp`

David J. C. MacKay: “Bayesian Interpolation”; Neural Computation
4(3):415-447; 1992.

David J. C. MacKay: “A Practical Bayesian Framework for
Backpropagation Networks”; Neural Computation 4(3):448-472;
1992.

David J. C. MacKay: “The Evidence Framework Applied to
Classification Networks”; Neural Computation 4(5):720-736;
1992.

David J. C. MacKay: “Comparison of Approximate Methods for Handling
Hyperparameters”; Neural Computation 11(5):1035-1068; 1999.

Foundation

Operators |