set_regularization_params_class_mlp
— Set the regularization parameters of a multilayer perceptron.
set_regularization_params_class_mlp( : : MLPHandle, GenParamName, GenParamValue : )
set_regularization_params_class_mlp
sets the regularization
parameters of the multilayer perceptron (MLP) passed in
MLPHandle
. The regularization parameter to be set is
specified with GenParamName
. Its value is specified with
GenParamValue
.
GenParamName
can assume the following values:
This parameter determines
whether the regularization parameters should be determined
automatically (GenParamValue
>= 1) or manually
(GenParamValue
= 0, default), as described below in the
sections “Technical Background” and “Automatic Determination of
the Regularization Parameters”. As described in detail in the
section “Automatic Determination of the Regularization
Parameters”, 'num_outer_iterations' should not be set too
large (in the range of 1 to 5) to enable manual checking of the
convergence of the automatic determination of the regularization
parameters.
This parameter potentially enables somewhat faster convergence of the automatic determination of the regularization parameters, as described below in the section “Automatic Determination of the Regularization Parameters”. It should typically be left at its default value of 1.
On the one hand, this selects the
regularization model to be used, as described below in the section
“Technical Background”. On the other hand, if manual
determination of the regularization parameters has been selected
(i.e., 'num_outer_iterations' = 0), the regularization
parameters are set with GenParamName
, whereas the initial
values of the regularization parameters are set if automatic
determination of the regularization parameters has been selected
(i.e., 'num_outer_iterations' >= 1), as described below in
the section “Automatic Determination of the Regularization
Parameters”. Manual determination of the regularzation parameters
(see the section “Regularization Parameters” below) is only
realistic if a single regularization parameter is used. In all
other cases, the regularization parameters should be determined
automatically.
This allows to specify a noise prior
for MLPs that have been configured for regression, as described
below in the section “Application Areas”. If manual determination
of the regularization parameters has been selected, the noise prior
is set with GenParamName
, whereas the initial value of the
noise prior is set if automatic determination of the regularization
parameters has been selected. Typically, it is only useful to use
this parameter if the regularization parameters are determined
automatically.
Please note that the automatic determination of the regularization
parameters requires a very large amount of memory and runtime, as
described in detail in the section “Complexity” below. Therefore,
NumHidden
should not be selected too large when the MLP is
created with create_class_mlp
. For example, normal OCR
applications seldom require NumHidden
to be larger than
30-60.
Application Areas
As described at create_class_mlp
, it may be desirable to
regularize the MLP to enforce a smoother transition of the
confidences between the different classes and to prevent overfitting
of the MLP to the training data. To achieve this, a penalty for
large MLP weights (which are the main reason for very sharp
transitions between classes) can be added to the training of the MLP
in train_class_mlp
by setting GenParamName
to
'weight_prior' and setting GenParamValue
to a
value > 0.
If the MLP has been configured for regression (i.e., if
OutputFunction
was set to 'linear' in
create_class_mlp
), an inverse variance of the expected noise
in the data can be specified by setting GenParamName
to
'noise_prior' and setting GenParamValue
to a value
> 0. Setting the noise prior only has an effect if a weight prior
has been specified. In this case, it can be used to weight the data
error term (the output error of the MLP) against the weight error
term.
As described in more detail below, the regularization parameters of
the MLP may be determined automatically (at the expense of
significantly increased training times) by setting
GenParamName
to 'num_outer_iterations' and setting
GenParamValue
to a value > 0.
Technical Background
There are three different kinds of penalty terms that can be set
with 'weight_prior' . Note that in the following the
parameters and
refer to the weights of the different
layers of the MLP, as described in create_class_mlp
.
If a single value is specified, all MLP weights are
penalized equally by adding the following term to the optimization
in train_class_mlp
:
Alternatively, four values can be specified. These four parameters enable the individual regularization of the four groups of weights:
Finally, values can be specified. These parameters enable the individual regularization of each input variable and the regularization of the remaining three groups of weights: This kind of regularization is only useful in conjunction with the automatic determination of the regularization parameters described below. If the automatic determination of the regularization parameters returns a very large value of (compared to the smallest value of the values ), the corresponding input variable has little relevance for the MLP output. If this is the case, it should be tested whether the input variable can be omitted from the input of the MLP without negatively affecting the MLP's performance. The advantage of omitting irrelevant input variables is an increased speed of the MLP for classification.
The parameters can be regarded as the inverse variance of a Gaussian prior distribution on the MLP weights, i.e., they express an expectation about the size of the MLP weights. The larger the are chosen, the smaller the MLP weights will be.
Regularization Parameters
The larger the regularization parameter(s) 'weight_prior' are chosen, the smoother the transition of the confidences between the different classes will be. The required values for the regularization parameter(s) depend on the MLP, especially the number of hidden units, the training data, and the scale of the training data (if no normalization is used). Typically, a higher value for the regularization parameter(s) is necessary if the MLP has more hidden units and if the training data consists of more points. For typical applications, the regularization parameters are determined by verifying the MLP performance on a test data set that is independent from the training data set. If an independent test data set is unavailable, cross validation can be used. Cross validation works by splitting the data set into separate parts (for example, 80% of the data set for training and 20% for testing), training the MLP with the training data set (the 80% of the data in the above example), and testing the MLP performance on the test set (the 20% of the data in the above example). The procedure can be repeated for the other possible splits of the data (in the 80%-20% example, there are five possible splits). This procedure can, for example, start with relatively large values of the weight regularization parameters (which will typically result in misclassifications on the test data set). The weight regularization parameters can then be decreased until an acceptable performance on the test data sets is reached.
Automatic Determination of the Regularization Parameters
The regularization parameters, i.e., the weight priors and the noise
prior, can also be determined automatically by
train_class_mlp
using the so-called evidence procedure (for
details about the evidence procedure, please refer to the articles
in the section “References” below). This training mode can be
selected by setting GenParamName
to
'num_outer_iterations' and setting GenParamValue
to a value > 0. Note that this typically results in training times
that are one to three orders of magnitude larger than simply
training the MLP with fixed regularization parameters.
The evidence procedure is an iterative algorithm that performs the following two steps for a number of outer iterations: first, the network is trained using the current values of the regularization parameters; next, the regularization parameters are re-estimated using the weights of the optimized MLP. In the first iteration, the weight priors and noise priors specified with 'weight_prior' and 'noise_prior' are used. Thus, for the automatic determination of the regularization parameters, the values specified by the user serve as the starting parameters for the evidence procedure. The starting parameters for the weight priors should not be set too large because this might over-regularize the training and may result in badly determined regularization parameters. The initial values for the weight priors should typically be in the range 0.01-0.1.
The number of outer iterations can be set by setting
GenParamName
to 'num_outer_iterations' and setting
GenParamValue
to a value > 0. If GenParamValue
is
set to 0 (this is the default value), the evidence procedure is not
executed and the MLP is simply trained using the user-specified
regularization parameters.
The number of outer iterations should be set high enough to ensure
the convergence of the regularization parameters. In contrast to
the training of the MLP's weights, a numerical convergence criterion
is typically very difficult to specify and some human judgement is
typically required to decide whether the regularization parameters
have converged sufficiently. Therefore, it might not be possible to
set the number of outer iterations a priori to ensure convergence of
the regularization parameters. In these cases, the outer loop over
the steps of the evidence procedure can be implemented manually by
setting 'num_outer_iterations' to 1 and calling
train_class_mlp
repeatedly. This has the advantage that the
weight priors and noise prior can be queried after each iteration
and can be checked manually for convergence. In this approach, the
performance of the MLP can even be checked after each iteration on
an independent test set to check the generalization performance of
the classifier.
If the number of outer iterations has been determined
(approximately) for a class of applications, it may be possible to
reduce the run time of the training (if MLPs should be trained in
the future with similar data sets) by setting GenParamName
to 'num_inner_iterations' and setting
GenParamValue
to a value > 1 (the default value is 1) and
by reducing the number of outer iterations. The number of outer
iterations can typically not be reduced by the same factor by which
the number of inner iterations is increased. Using this approach,
the run time of the training can be optimized. However, this approach
is only useful if many MLPs are trained with similar data sets. If
this is not the case, 'num_inner_iterations' should be left
at its default value of 1.
The automatically determined weight priors and noise prior can be
queried after the training using
get_regularization_params_class_mlp
by setting
GenParamName
to 'weight_prior' or
'noise_prior' , respectively.
In addition to the weight prior and noise prior, the evidence
procedure determines an estimate of the number of parameters of the
MLP that can be determined well using the training data. This
result can be queried using
get_regularization_params_class_mlp
by setting
GenParamName
to 'num_well_determined_params' .
Alternatively, the fraction of well-determined parameters can be
queried by setting GenParamName
to
'fraction_well_determined_params' . If the number of
well-determined parameters is significantly smaller than
(where is the number of weights in
the MLP, as described in the section “Complexity” below) or the
fraction of well-determined parameters is significantly smaller than
1, consider reducing the number of hidden units or, if the number of
hidden units cannot be decreased without increasing the error rate
of the MLP significantly, consider performing a preprocessing that
reduces the number of input variables to the net, i.e., canonical
variates or principal components.
Please note that the number of well-determined parameters can only
be determined after the weight priors and noise prior have been
determined. This is the reason why the evidence procedure ends with
the determination of the regularization parameters and not with the
training of the MLP weights. Hence, after the evidence procedure
the MLP will not have been trained with the latest regularization
parameters. This should make no difference if they have converged.
If you want the training to end with an optimization of the weights
using the latest values of the regularization parameters, you can
set 'num_outer_iterations' to 0 and can call
train_class_mlp
again. If you do so, please note, however,
that the number of well-determined parameters may change and,
therefore, the value returned by
get_regularization_params_class_mlp
is technically
inconsistent.
Saved Parameters
Note that the parameters 'num_outer_iterations' and
'num_inner_iterations' only affect the training of the MLP.
Therefore, they are not saved when the MLP is stored using
write_class_mlp
or serialize_class_mlp
. Thus, they
must be set anew if the MLP is loaded again using
read_class_mlp
or deserialize_class_mlp
and if
training using the automatic determination of the regularization
parameters should be continued. All other parameters described
above ('weight_prior' , 'noise_prior' ,
'num_well_determined_params' , and
'fraction_well_determined_params' ) are saved.
This operator modifies the state of the following input parameter:
During execution of this operator, access to the value of this parameter must be synchronized if it is used across multiple threads.
MLPHandle
(input_control, state is modified) class_mlp →
(handle)
MLP handle.
GenParamName
(input_control) string →
(string)
Name of the regularization parameter to set.
Default value: 'weight_prior'
List of values: 'noise_prior' , 'num_inner_iterations' , 'num_outer_iterations' , 'weight_prior'
GenParamValue
(input_control) number(-array) →
(real / integer)
Value of the regularization parameter.
Default value: 1.0
Suggested values: 0.01, 0.1, 1.0, 10.0, 100.0, 0, 1, 2, 3, 5, 10, 15, 20
* This example shows how to determine the regularization parameters * automatically without examining the convergence of the * regularization parameters. * Create the MLP. create_class_mlp (NumIn, NumHidden, NumOut, 'softmax', \ 'normalization', NumIn, 42, MLPHandle) * Generate and add the training data for J := 0 to NumData-1 by 1 * Generate training features and classes. * Data = [...] * Class = [...] add_sample_class_mlp (MLPHandle, Data, Class) endfor * Set up the automatic determination of the regularization * parameters. set_regularization_params_class_mlp (MLPHandle, 'weight_prior', \ [0.01,0.01,0.01,0.01]) set_regularization_params_class_mlp (MLPHandle, \ 'num_outer_iterations', 10) * Train the MLP. train_class_mlp (MLPHandle, 100, 1, 0.01, Error, ErrorLog) * Read out the estimate of the number of well-determined * parameters. get_regularization_params_class_mlp (MLPHandle, \ 'fraction_well_determined_params', \ FractionParams) * If FractionParams differs substantially from 1, consider reducing * NumHidden appropriately and consider performing a preprocessing that * reduces the number of input variables to the net, i.e., canonical * variates or principal components. write_class_mlp (MLPHandle, 'classifier.mlp') * This example shows how to determine the regularization parameters * automatically while examining the convergence of the * regularization parameters. * Create the MLP. create_class_mlp (NumIn, NumHidden, NumOut, 'softmax', \ 'normalization', NumIn, 42, MLPHandle) * Generate and add the training data. for J := 0 to NumData-1 by 1 * Generate training features and classes * Data = [...] * Class = [...] add_sample_class_mlp (MLPHandle, Data, Class) endfor * Set up the automatic determination of the regularization * parameters. set_regularization_params_class_mlp (MLPHandle, 'weight_prior', \ [0.01,0.01,0.01,0.01]) set_regularization_params_class_mlp (MLPHandle, \ 'num_outer_iterations', 1) for OuterIt := 1 to 10 by 1 * Train the MLP train_class_mlp (MLPHandle, 100, 1, 0.01, Error, ErrorLog) * Read out the regularization parameters get_regularization_params_class_mlp (MLPHandle, 'weight_prior', \ WeightPrior) * Inspect the regularization parameters manually for * convergence and exit the loop manually if they have * converged. * [...] endfor * Read out the estimate of the number of well-determined * parameters. get_regularization_params_class_mlp (MLPHandle,\ 'fraction_well_determined_params',\ FractionParams) * If FractionParams differs substantially from 1, consider reducing * NumHidden appropriately and consider performing a preprocessing that * reduces the number of input variables to the net, i.e., canonical * variates or principal components. write_class_mlp (MLPHandle, 'classifier.mlp')
Let denote the number of input units of the MLP
(i.e.,
or , depending on the value of
Preprocessing
, as described at create_class_mlp
),
the number of hidden units, and
the number of output units. Then, the number of weights of the MLP
is . Let denote the number of training
samples. Let denote the number of iterations set
with MaxIterations
in train_class_mlp
. Let
and denote the number of outer and
inner iterations, respectively.
The run time of the training without regularization or with regularization with fixed regularization parameters is of complexity . In contrast, the runtime of the training with automatic determination of the regularization parameters is of complexity .
The training without regularization or with regularization with fixed regularization parameters requires at least bytes of memory. The training with automatic determination of the regularization parameters requires at least bytes of memory. Under special circumstances, another bytes of memory are required.
If the parameters are valid, the operator
set_regularization_params_class_mlp
returns the value 2 (H_MSG_TRUE).
If necessary, an exception is raised.
get_regularization_params_class_mlp
,
train_class_mlp
David J. C. MacKay: “Bayesian Interpolation”; Neural Computation
4(3):415-447; 1992.
David J. C. MacKay: “A Practical Bayesian Framework for
Backpropagation Networks”; Neural Computation 4(3):448-472;
1992.
David J. C. MacKay: “The Evidence Framework Applied to
Classification Networks”; Neural Computation 4(5):720-736;
1992.
David J. C. MacKay: “Comparison of Approximate Methods for Handling
Hyperparameters”; Neural Computation 11(5):1035-1068; 1999.
Foundation