create_class_svm — Create a support vector machine for pattern classification.
create_class_svm creates a support vector machine that can
be used for pattern classification. The dimension of the patterns
to be classified is specified in
NumFeatures, the number of
different classes in
For a binary classification problem in which the classes are linearly separable the SVM algorithm selects data vectors from the training set that are utilized to construct the optimal separating hyperplane between different classes. This hyperplane is optimal in the sense that the margin between the convex hulls of the different classes is maximized. The training patterns that are located at the margin define the hyperplane and are called support vectors (SV).
Classification of a feature vector z is performed with the
Here, are the support vectors,
encodes their class membership () and
the weight coefficients. The distance of
the hyperplane to the origin is b. The and b
are determined during training with
that only a subset of the original training set (: number
of support vectors) is necessary for the definition of the decision boundary
and therefore data vectors that are not support vectors are discarded. The
classification speed depends on the evaluation of the dot product between
support vectors and the feature vector to be classified, and hence depends on
the length of the feature vector and the number of support
For classification problems in which the classes are not linearly
separable the algorithm is extended in two ways. First, during
training a certain amount of errors (overlaps) is compensated with
the use of slack variables. This means that the
are upper bounded by a regularization constant. To enable an
intuitive control of the amount of training errors, the
Nu-SVM version of the training algorithm is used. Here, the
Nu is an asymptotic upper bound on
the number of training errors and an asymptotic lower bound on the
number of support vectors. As a rule of thumb, the parameter
Nu should be set to the prior expectation of the
application's specific error ratio, e.g., 0.01
(corresponding to a maximum training error of 1%). Please note
that a too big value for
Nu might lead to an infeasible
training problem, i.e., the SVM cannot be trained correctly (see
train_class_svm for more details). Since this can only be
determined during training, an exception can only be raised there.
In this case, a new SVM with
Nu chosen smaller must be
Second, because the above SVM exclusively calculates dot products between the feature vectors, it is possible to incorporate a kernel function into the training and testing algorithm. This means that the dot products are substituted by a kernel function, which implicitly performs the dot product in a higher dimensional feature space. Given the appropriate kernel transformation, an originally not linearly separable classification task becomes linearly separable in the higher dimensional feature space.
Different kernel functions can be selected with the parameter
KernelType = 'linear' the dot
product, as specified in the above formula is calculated. This kernel should
solely be used for linearly or nearly linearly separable
classification tasks. The parameter
KernelParam is ignored
The radial basis function (RBF)
'rbf' is the best choice for a kernel function because it
achieves good results for many classification tasks. It is defined
Here, the parameter
KernelParam is used to select
. The intuitive meaning of is
the amount of influence of a support vector upon its surroundings. A big
value of (small influence on the surroundings) means that
each training vector becomes a support vector. The training algorithm learns
the training data “by heart”, but lacks any generalization ability
(over-fitting). Additionally, the training/classification times grow
significantly. A too small value for (big influence on
the surroundings) leads to few support vectors defining the separating
hyperplane (under-fitting). One typical strategy is to select a
Nu pair and consecutively
increase the values as long as the recognition rate increases.
KernelType = 'polynomial_homogeneous' or
'polynomial_inhomogeneous', polynomial kernels can be
selected. They are defined in the following way:
The degree of the polynomial kernel must be set with
KernelParam. Please note that a too high degree polynomial
(d > 10) might result in numerical problems.
As a rule of thumb, the RBF kernel provides a good choice for most
of the classification problems and should therefore be used in
almost all cases. Nevertheless, the linear and polynomial kernels
might be better suited for certain applications and can be tested
for comparison. Please note that the novelty-detection
and the operator
reduce_class_svm are provided only for the RBF
Mode specifies the general classification task, which is
either how to break down a multi-class decision problem to binary
sub-cases or whether to use a special classifier mode called
'one-versus-all' creates a classifier where each class is
compared to the rest of the training data. During testing the
class with the largest output (see the classification formula
without sign) is chosen.
Mode = 'one-versus-one'
creates a binary classifier between each single class. During
testing a vote is cast and the class with the majority of the votes
is selected. The optimal
Mode for multi-class
classification depends on the number of classes. Given n classes
'one-versus-all' creates n classifiers, whereas
'one-versus-one' creates n(n-1)/2. Note that for a binary
decision task 'one-versus-one' would create exactly one,
whereas 'one-versus-all' unnecessarily creates two
symmetric classifiers. For few classes (approximately up to 10)
'one-versus-one' is faster for training and testing, because
the sub-classifier all consist of fewer training data and result in
overall fewer support vectors. In case of many classes
'one-versus-all' is preferable, because
'one-versus-one' generates a prohibitively large amount of
sub-classifiers, as their number grows quadratically with the number
A special case of classification is
'novelty-detection', where the test data is classified
only with regard to membership to the training data,
NumClasses must be set to 1. The separating
hyperplane lies around the training data and thereby implicitly
divides the training data from the rejection class. The advantage is
that the rejection class is not defined explicitly, which is
difficult to do in certain applications like texture classification. The
resulting support vectors are all lying at the border. With the
Nu, the ratio of outliers in the training data set is
specified. Note, that when classifying in the 'novelty-detection'
mode, the class of the training data is returned with index 1 and
the rejection class is returned with index 0. Thus, the first class
serves as rejection class. In contrast, when using the MLP
classifier, the last class serves as rejection class by default.
be used to specify a preprocessing of the feature vectors. For
Preprocessing = 'none', the feature vectors are
passed unaltered to the SVM.
NumComponents is ignored in
For all other values of
Preprocessing, the training data
set is used to compute a transformation of the feature vectors
during the training as well as later in the classification.
Preprocessing = 'normalization', the feature
vectors are normalized. In case of a polynomial kernel, the minimum
and maximum value of the training data set is transformed to -1 and
+1. In case of the RBF kernel, the data is normalized by subtracting
the mean of the training vectors and dividing the result by the
standard deviation of the individual components of the training
vectors. Hence, the transformed feature vectors have a mean of 0
and a standard deviation of 1. The normalization does not change
the length of the feature vector.
NumComponents is ignored
in this case. This transformation can be used if the mean and
standard deviation of the feature vectors differs substantially from
0 and 1, respectively, or for data in which the components of the
feature vectors are measured in different units (e.g., if some of
the data are gray value features and some are region features, or if
region features are mixed, e.g., 'circularity' (unit: scalar) and
'area' (unit: pixel squared)). The normalization transformation
should be performed in general, because it increases the numerical
stability during training/testing.
Preprocessing = 'principal_components', a
principal component analysis (PCA) is performed. First, the feature
vectors are normalized (see above). Then, an orthogonal
transformation (a rotation in the feature space) that decorrelates
the training vectors is computed. After the transformation, the
mean of the training vectors is 0 and the covariance matrix of the
training vectors is a diagonal matrix. The transformation is chosen
such that the transformed features that contain the most variation
is contained in the first components of the transformed feature
vector. With this, it is possible to omit the transformed features
in the last components of the feature vector, which typically are
mainly influenced by noise, without losing a large amount of
information. The parameter
NumComponents can be used to
determine how many of the transformed feature vector components
should be used. Up to
NumFeatures components can be
selected. The operator
get_prep_info_class_svm can be used
to determine how much information each transformed component
contains. Hence, it aids the selection of
Like data normalization, this transformation can be used if the mean
and standard deviation of the feature vectors differs substantially
from 0 and 1, respectively, or for feature vectors in which the
components of the data are measured in different units. In
addition, this transformation is useful if it can be expected that
the features are highly correlated. Please note that the RBF kernel
is very robust against the dimensionality reduction performed by PCA
and should therefore be the first choice when speeding up the
The transformation specified by
'canonical_variates' first normalizes the training vectors
and then decorrelates the training vectors on average over all
classes. At the same time, the transformation maximally separates
the mean values of the individual classes. As for
Preprocessing = 'principal_components', the
transformed components are sorted by information content, and hence
transformed components with little information content can be
omitted. For canonical variates, up to min(
NumFeatures) components can be selected. Also in this
case, the information content of the transformed components can be
get_prep_info_class_svm. Like principal
component analysis, canonical variates can be used to reduce the
amount of data without losing a large amount of information, while
additionally optimizing the separability of the classes after the
data reduction. The computation of the canonical variates is also
called linear discriminant analysis.
For the last two types of transformations
('principal_components' and 'canonical_variates'),
the length of input data of the SVM is determined by
NumFeatures determines the
dimensionality of the input data (i.e., the length of the
untransformed feature vector). Hence, by using one of these two
transformations, the size of the SVM with respect to data length is
reduced, leading to shorter training/classification times by the
After the SVM has been created with
typically training samples are added to the SVM by repeatedly
read_samples_class_svm. After this, the SVM is typically
train_class_svm. Hereafter, the SVM can be
write_class_svm. Alternatively, the SVM can be
used immediately after training to classify data using
A comparison of the SVM and the multi-layer perceptron (MLP) (see
create_class_mlp) typically shows that SVMs are generally
faster at training, especially for huge training sets, and achieve
slightly better recognition rates than MLPs. The MLP is faster at
classification and should therefore be preferred in time critical
applications. Please note that this guideline assumes optimal
tuning of the parameters.
This operator returns a handle. Note that the state of an instance of this handle type may be changed by specific operators even though the handle is used as an input parameter by those operators.
Number of input variables (features) of the SVM.
Default value: 10
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100
NumFeatures >= 1
The kernel type.
Default value: 'rbf'
List of values: 'linear', 'polynomial_homogeneous', 'polynomial_inhomogeneous', 'rbf'
Additional parameter for the kernel function. In case of RBF kernel the value for . For polynomial kernel the degree
Default value: 0.02
Suggested values: 0.01, 0.02, 0.05, 0.1, 0.5
Regularisation constant of the SVM.
Default value: 0.05
Suggested values: 0.0001, 0.001, 0.01, 0.05, 0.1, 0.2, 0.3
Nu > 0.0 && Nu < 1.0
Number of classes.
Default value: 5
Suggested values: 2, 3, 4, 5, 6, 7, 8, 9, 10
NumClasses >= 1
The mode of the SVM.
Default value: 'one-versus-one'
List of values: 'novelty-detection', 'one-versus-all', 'one-versus-one'
Type of preprocessing used to transform the feature vectors.
Default value: 'normalization'
List of values: 'canonical_variates', 'none', 'normalization', 'principal_components'
Default value: 10
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100
NumComponents >= 1
create_class_svm (NumFeatures, 'rbf', 0.01, 0.01, NumClasses,\ 'one-versus-all', 'normalization', NumFeatures,\ SVMHandle) * Generate and add the training data for J := 0 to NumData-1 by 1 * Generate training features and classes * Data = [...] * Class = ... add_sample_class_svm (SVMHandle, Data, Class) endfor * Train the SVM train_class_svm (SVMHandle, 0.001, 'default') * Use the SVM to classify unknown data for J := 0 to N-1 by 1 * Extract features * Features = [...] classify_class_svm (SVMHandle, Features, 1, Class) endfor
If the parameters are valid the operator
returns the value 2 (H_MSG_TRUE). If necessary, an exception is
Bernhard Schölkopf, Alexander J.Smola: “Learning with Kernels”;
MIT Press, London; 1999.
John Shawe-Taylor, Nello Cristianini: “Kernel Methods for Pattern Analysis”; Cambridge University Press, Cambridge; 2004.