create_class_svm [HALCON Operator Reference / Version 11.0.5]

create_class_svmcreate_class_svmCreateClassSvmcreate_class_svmCreateClassSvmCreateClassSvm — Create a support vector machine for pattern classification.

Signature

Herror create_class_svm(const Hlong NumFeatures, const char* KernelType, double KernelParam, double Nu, const Hlong NumClasses, const char* Mode, const char* Preprocessing, const Hlong NumComponents, Hlong* SVMHandle)

Herror T_create_class_svm(const Htuple NumFeatures, const Htuple KernelType, const Htuple KernelParam, const Htuple Nu, const Htuple NumClasses, const Htuple Mode, const Htuple Preprocessing, const Htuple NumComponents, Htuple* SVMHandle)

Herror create_class_svm(const HTuple& NumFeatures, const HTuple& KernelType, const HTuple& KernelParam, const HTuple& Nu, const HTuple& NumClasses, const HTuple& Mode, const HTuple& Preprocessing, const HTuple& NumComponents, Hlong* SVMHandle)

void HClassSvm::CreateClassSvm(const HTuple& NumFeatures, const HTuple& KernelType, const HTuple& KernelParam, const HTuple& Nu, const HTuple& NumClasses, const HTuple& Mode, const HTuple& Preprocessing, const HTuple& NumComponents)

void CreateClassSvm(const HTuple& NumFeatures, const HTuple& KernelType, const HTuple& KernelParam, const HTuple& Nu, const HTuple& NumClasses, const HTuple& Mode, const HTuple& Preprocessing, const HTuple& NumComponents, HTuple* SVMHandle)

void HClassSvm::HClassSvm(Hlong NumFeatures, const HString& KernelType, double KernelParam, double Nu, Hlong NumClasses, const HString& Mode, const HString& Preprocessing, Hlong NumComponents)

void HClassSvm::HClassSvm(Hlong NumFeatures, const char* KernelType, double KernelParam, double Nu, Hlong NumClasses, const char* Mode, const char* Preprocessing, Hlong NumComponents)

void HClassSvm::CreateClassSvm(Hlong NumFeatures, const HString& KernelType, double KernelParam, double Nu, Hlong NumClasses, const HString& Mode, const HString& Preprocessing, Hlong NumComponents)

void HClassSvm::CreateClassSvm(Hlong NumFeatures, const char* KernelType, double KernelParam, double Nu, Hlong NumClasses, const char* Mode, const char* Preprocessing, Hlong NumComponents)

void HOperatorSetX.CreateClassSvm( [in] VARIANT NumFeatures, [in] VARIANT KernelType, [in] VARIANT KernelParam, [in] VARIANT Nu, [in] VARIANT NumClasses, [in] VARIANT Mode, [in] VARIANT Preprocessing, [in] VARIANT NumComponents, [out] VARIANT* SVMHandle)

void HClassSvmX.CreateClassSvm( [in] Hlong NumFeatures, [in] BSTR KernelType, [in] double KernelParam, [in] double Nu, [in] Hlong NumClasses, [in] BSTR Mode, [in] BSTR Preprocessing, [in] Hlong NumComponents)

Description

For a binary classification problem in which the classes are linearly separable the SVM algorithm selects data vectors from the training set that are utilized to construct the optimal separating hyperplane between different classes. This hyperplane is optimal in the sense that the margin between the convex hulls of the different classes is maximized. The training patterns that are located at the margin define the hyperplane and are called support vectors (SV).

Here, x_i are the support vectors, y_i encodes their class membership (+/- 1) and alpha_i the weight coefficients. The distance of the hyperplane to the origin is b. The alpha and b are determined during training with train_class_svmtrain_class_svmTrainClassSvmtrain_class_svmTrainClassSvmTrainClassSvm. Note that only a subset of the original training set (n_sv: number of support vectors) is necessary for the definition of the decision boundary and therefore data vectors that are not support vectors are discarded. The classification speed depends on the evaluation of the dot product between support vectors and the feature vector to be classified, and hence depends on the length of the feature vector and the number n_sv of support vectors.

For classification problems in which the classes are not linearly separable the algorithm is extended in two ways. First, during training a certain amount of errors (overlaps) is compensated with the use of slack variables. This means that the alpha are upper bounded by a regularization constant. To enable an intuitive control of the amount of training errors, the Nu-SVM version of the training algorithm is used. Here, the regularization parameter NuNuNuNuNunu is an asymptotic upper bound on the number of training errors and an asymptotic lower bound on the number of support vectors. As a rule of thumb, the parameter NuNuNuNuNunu should be set to the prior expectation of the application's specific error ratio, e.g., 0.01 (corresponding to a maximum training error of 1%). Please note that a too big value for NuNuNuNuNunu might lead to an infeasible training problem, i.e., the SVM cannot be trained correctly (see train_class_svmtrain_class_svmTrainClassSvmtrain_class_svmTrainClassSvmTrainClassSvm for more details). Since this can only be determined during training, an exception can only be raised there. In this case, a new SVM with NuNuNuNuNunu chosen smaller must be created.

Second, because the above SVM exclusively calculates dot products between the feature vectors, it is possible to incorporate a kernel function into the training and testing algorithm. This means that the dot products are substituted by a kernel function, which implicitly performs the dot product in a higher dimensional feature space. Given the appropriate kernel transformation, an originally not linearly separable classification task becomes linearly separable in the higher dimensional feature space.

Here, the parameter KernelParamKernelParamKernelParamKernelParamKernelParamkernelParam is used to select gamma. The intuitive meaning of gamma is the amount of influence of a support vector upon its surroundings. A big value of gamma (small influence on the surroundings) means that each training vector becomes a support vector. The training algorithm learns the training data “by heart”, but lacks any generalization ability (over-fitting). Additionally, the training/classification times grow significantly. A too small value for gamma (big influence on the surroundings) leads to few support vectors defining the separating hyperplane (under-fitting). One typical strategy is to select a small gamma-NuNuNuNuNunu pair and consecutively increase the values as long as the recognition rate increases.

With KernelTypeKernelTypeKernelTypeKernelTypeKernelTypekernelType = 'polynomial_homogeneous'"polynomial_homogeneous""polynomial_homogeneous""polynomial_homogeneous""polynomial_homogeneous""polynomial_homogeneous" or 'polynomial_inhomogeneous'"polynomial_inhomogeneous""polynomial_inhomogeneous""polynomial_inhomogeneous""polynomial_inhomogeneous""polynomial_inhomogeneous", polynomial kernels can be selected. They are defined in the following way:

As a rule of thumb, the RBF kernel provides a good choice for most of the classification problems and should therefore be used in almost all cases. Nevertheless, the linear and polynomial kernels might be better suited for certain applications and can be tested for comparison. Please note that the novelty-detection ModeModeModeModeModemode and the operator reduce_class_svmreduce_class_svmReduceClassSvmreduce_class_svmReduceClassSvmReduceClassSvm are provided only for the RBF kernel.

ModeModeModeModeModemode specifies the general classification task, which is either how to break down a multi-class decision problem to binary sub-cases or whether to use a special classifier mode called 'novelty-detection'"novelty-detection""novelty-detection""novelty-detection""novelty-detection""novelty-detection". ModeModeModeModeModemode = 'one-versus-all'"one-versus-all""one-versus-all""one-versus-all""one-versus-all""one-versus-all" creates a classifier where each class is compared to the rest of the training data. During testing the class with the largest output (see the classification formula without sign) is chosen. ModeModeModeModeModemode = 'one-versus-one'"one-versus-one""one-versus-one""one-versus-one""one-versus-one""one-versus-one" creates a binary classifier between each single class. During testing a vote is cast and the class with the majority of the votes is selected. The optimal ModeModeModeModeModemode for multi-class classification depends on the number of classes. Given n classes 'one-versus-all'"one-versus-all""one-versus-all""one-versus-all""one-versus-all""one-versus-all" creates n classifiers, whereas 'one-versus-one'"one-versus-one""one-versus-one""one-versus-one""one-versus-one""one-versus-one" creates n(n-1)/2. Note that for a binary decision task 'one-versus-one'"one-versus-one""one-versus-one""one-versus-one""one-versus-one""one-versus-one" would create exactly one, whereas 'one-versus-all'"one-versus-all""one-versus-all""one-versus-all""one-versus-all""one-versus-all" unnecessarily creates two symmetric classifiers. For few classes (approximately up to 10) 'one-versus-one'"one-versus-one""one-versus-one""one-versus-one""one-versus-one""one-versus-one" is faster for training and testing, because the sub-classifier all consist of fewer training data and result in overall fewer support vectors. In case of many classes 'one-versus-all'"one-versus-all""one-versus-all""one-versus-all""one-versus-all""one-versus-all" is preferable, because 'one-versus-one'"one-versus-one""one-versus-one""one-versus-one""one-versus-one""one-versus-one" generates a prohibitively large amount of sub-classifiers, as their number grows quadratically with the number of classes.

A special case of classification is ModeModeModeModeModemode = 'novelty-detection'"novelty-detection""novelty-detection""novelty-detection""novelty-detection""novelty-detection", where the test data is classified only with regard to membership to the training data, i.e., NumClassesNumClassesNumClassesNumClassesNumClassesnumClasses must be set to 1. The separating hyperplane lies around the training data and thereby implicitly divides the training data from the rejection class. The advantage is that the rejection class is not defined explicitly, which is difficult to do in certain applications like texture classification. The resulting support vectors are all lying at the border. With the parameter NuNuNuNuNunu, the ratio of outliers in the training data set is specified. Note, that when classifying in the 'novelty-detection'"novelty-detection""novelty-detection""novelty-detection""novelty-detection""novelty-detection" mode, the class of the training data is returned with index 1 and the rejection class is returned with index 0. Thus, the first class serves as rejection class. In contrast, when using the MLP classifier, the last class serves as rejection class by default.

For PreprocessingPreprocessingPreprocessingPreprocessingPreprocessingpreprocessing = 'normalization'"normalization""normalization""normalization""normalization""normalization", the feature vectors are normalized. In case of a polynomial kernel, the minimum and maximum value of the training data set is transformed to -1 and +1. In case of the RBF kernel, the data is normalized by subtracting the mean of the training vectors and dividing the result by the standard deviation of the individual components of the training vectors. Hence, the transformed feature vectors have a mean of 0 and a standard deviation of 1. The normalization does not change the length of the feature vector. NumComponentsNumComponentsNumComponentsNumComponentsNumComponentsnumComponents is ignored in this case. This transformation can be used if the mean and standard deviation of the feature vectors differs substantially from 0 and 1, respectively, or for data in which the components of the feature vectors are measured in different units (e.g., if some of the data are gray value features and some are region features, or if region features are mixed, e.g., 'circularity' (unit: scalar) and 'area' (unit: pixel squared)). The normalization transformation should be performed in general, because it increases the numerical stability during training/testing.

For PreprocessingPreprocessingPreprocessingPreprocessingPreprocessingpreprocessing = 'principal_components'"principal_components""principal_components""principal_components""principal_components""principal_components", a principal component analysis (PCA) is performed. First, the feature vectors are normalized (see above). Then, an orthogonal transformation (a rotation in the feature space) that decorrelates the training vectors is computed. After the transformation, the mean of the training vectors is 0 and the covariance matrix of the training vectors is a diagonal matrix. The transformation is chosen such that the transformed features that contain the most variation is contained in the first components of the transformed feature vector. With this, it is possible to omit the transformed features in the last components of the feature vector, which typically are mainly influenced by noise, without losing a large amount of information. The parameter NumComponentsNumComponentsNumComponentsNumComponentsNumComponentsnumComponents can be used to determine how many of the transformed feature vector components should be used. Up to NumFeaturesNumFeaturesNumFeaturesNumFeaturesNumFeaturesnumFeatures components can be selected. The operator get_prep_info_class_svmget_prep_info_class_svmGetPrepInfoClassSvmget_prep_info_class_svmGetPrepInfoClassSvmGetPrepInfoClassSvm can be used to determine how much information each transformed component contains. Hence, it aids the selection of NumComponentsNumComponentsNumComponentsNumComponentsNumComponentsnumComponents. Like data normalization, this transformation can be used if the mean and standard deviation of the feature vectors differs substantially from 0 and 1, respectively, or for feature vectors in which the components of the data are measured in different units. In addition, this transformation is useful if it can be expected that the features are highly correlated. Please note that the RBF kernel is very robust against the dimensionality reduction performed by PCA and should therefore be the first choice when speeding up the classification time.

For the last two types of transformations ('principal_components'"principal_components""principal_components""principal_components""principal_components""principal_components" and 'canonical_variates'"canonical_variates""canonical_variates""canonical_variates""canonical_variates""canonical_variates"), the length of input data of the SVM is determined by NumComponentsNumComponentsNumComponentsNumComponentsNumComponentsnumComponents, whereas NumFeaturesNumFeaturesNumFeaturesNumFeaturesNumFeaturesnumFeatures determines the dimensionality of the input data (i.e., the length of the untransformed feature vector). Hence, by using one of these two transformations, the size of the SVM with respect to data length is reduced, leading to shorter training/classification times by the SVM.

A comparison of the SVM and the multi-layer perceptron (MLP) (see create_class_mlpcreate_class_mlpCreateClassMlpcreate_class_mlpCreateClassMlpCreateClassMlp) typically shows that SVMs are generally faster at training, especially for huge training sets, and achieve slightly better recognition rates than MLPs. The MLP is faster at classification and should therefore be preferred in time critical applications. Please note that this guideline assumes optimal tuning of the parameters.

Parallelization

Parameters

NumFeaturesNumFeaturesNumFeaturesNumFeaturesNumFeaturesnumFeatures (input_control) integer → (integer)

Number of input variables (features) of the SVM.

Default value: 10

Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100

Restriction: NumFeatures >= 1

KernelTypeKernelTypeKernelTypeKernelTypeKernelTypekernelType (input_control) string → (string)

The kernel type.

Default value: 'rbf' "rbf" "rbf" "rbf" "rbf" "rbf"

List of values: 'linear'"linear""linear""linear""linear""linear", 'polynomial_homogeneous'"polynomial_homogeneous""polynomial_homogeneous""polynomial_homogeneous""polynomial_homogeneous""polynomial_homogeneous", 'polynomial_inhomogeneous'"polynomial_inhomogeneous""polynomial_inhomogeneous""polynomial_inhomogeneous""polynomial_inhomogeneous""polynomial_inhomogeneous", 'rbf'"rbf""rbf""rbf""rbf""rbf"

KernelParamKernelParamKernelParamKernelParamKernelParamkernelParam (input_control) real → (real)

Additional parameter for the kernel function. In case of RBF kernel the value for gamma. For polynomial kernel the degree

Default value: 0.02

Suggested values: 0.01, 0.02, 0.05, 0.1, 0.5

NuNuNuNuNunu (input_control) real → (real)

Regularisation constant of the SVM.

Default value: 0.05

Suggested values: 0.0001, 0.001, 0.01, 0.05, 0.1, 0.2, 0.3

Restriction: Nu > 0.0 && Nu < 1.0

NumClassesNumClassesNumClassesNumClassesNumClassesnumClasses (input_control) integer → (integer)

Number of classes.

Default value: 5

Suggested values: 2, 3, 4, 5, 6, 7, 8, 9, 10

Restriction: NumClasses >= 1

ModeModeModeModeModemode (input_control) string → (string)

The mode of the SVM.

Default value: 'one-versus-one' "one-versus-one" "one-versus-one" "one-versus-one" "one-versus-one" "one-versus-one"

List of values: 'novelty-detection'"novelty-detection""novelty-detection""novelty-detection""novelty-detection""novelty-detection", 'one-versus-all'"one-versus-all""one-versus-all""one-versus-all""one-versus-all""one-versus-all", 'one-versus-one'"one-versus-one""one-versus-one""one-versus-one""one-versus-one""one-versus-one"

PreprocessingPreprocessingPreprocessingPreprocessingPreprocessingpreprocessing (input_control) string → (string)

Type of preprocessing used to transform the feature vectors.

Default value: 'normalization' "normalization" "normalization" "normalization" "normalization" "normalization"

List of values: 'canonical_variates'"canonical_variates""canonical_variates""canonical_variates""canonical_variates""canonical_variates", 'none'"none""none""none""none""none", 'normalization'"normalization""normalization""normalization""normalization""normalization", 'principal_components'"principal_components""principal_components""principal_components""principal_components""principal_components"

NumComponentsNumComponentsNumComponentsNumComponentsNumComponentsnumComponents (input_control) integer → (integer)

Preprocessing parameter: Number of transformed features (ignored for PreprocessingPreprocessingPreprocessingPreprocessingPreprocessingpreprocessing = 'none'"none""none""none""none""none" and PreprocessingPreprocessingPreprocessingPreprocessingPreprocessingpreprocessing = 'normalization'"normalization""normalization""normalization""normalization""normalization").

Default value: 10

Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100

Restriction: NumComponents >= 1

SVMHandleSVMHandleSVMHandleSVMHandleSVMHandleSVMHandle (output_control) class_svm → (integer)

SVM handle.

Example (HDevelop)

Result

If the parameters are valid the operator create_class_svmcreate_class_svmCreateClassSvmcreate_class_svmCreateClassSvmCreateClassSvm returns the value 2 (H_MSG_TRUE). If necessary, an exception is raised.

Possible Successors

Alternatives

create_class_svmcreate_class_svmCreateClassSvmcreate_class_svmCreateClassSvmCreateClassSvm (Operator)

Name