select_feature_set_gmmT_select_feature_set_gmmSelectFeatureSetGmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmm (Operator)

Name

select_feature_set_gmmT_select_feature_set_gmmSelectFeatureSetGmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmm — Selects an optimal combination from a set of features to classify the provided data.

Signature

select_feature_set_gmm( : : ClassTrainDataHandle, SelectionMethod, GenParamNames, GenParamValues : GMMHandle, SelectedFeatureIndices, Score)

Description

select_feature_set_gmmselect_feature_set_gmmSelectFeatureSetGmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmm selects an optimal subset from a set of features to solve a given classification problem. The classification problem has to be specified with annotated training data in ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle and will be classified by a Gaussian Mixture Model. Details of the properties of this classifier can be found in create_class_gmmcreate_class_gmmCreateClassGmmcreate_class_gmmCreateClassGmmCreateClassGmm.

A possible application of this operator can be a comparison of different parameter sets for certain feature extraction techniques. Another application is to search for a feature that is discriminating between different classes.

The indices of the selected subfeatures are returned in SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices. If names were set in set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData, these names are returned instead of the indices. If set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData was not called for ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle before, each element of the feature vector is considered as a subfeature.

The selection method SelectionMethodSelectionMethodSelectionMethodSelectionMethodSelectionMethodselectionMethod is either a greedy search 'greedy'"greedy""greedy""greedy""greedy""greedy" (iteratively add the feature with highest gain) or the dynamically oscillating search 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating" (add the feature with highest gain and test then if any of the already added features can be left out without great loss). The method 'greedy'"greedy""greedy""greedy""greedy""greedy" is generally preferable, since it is faster. Only in cases when the subfeatures are low-dimensional or redundant, the method 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating" should be chosen.

The optimization criterion is the classification rate of a two-fold cross-validation of the training data. The best achieved value is returned in ScoreScoreScoreScoreScorescore.

The following generic parameters can be set in GenParamNamesGenParamNamesGenParamNamesGenParamNamesGenParamNamesgenParamNames and GenParamValuesGenParamValuesGenParamValuesGenParamValuesGenParamValuesgenParamValues:

'min_centers'"min_centers""min_centers""min_centers""min_centers""min_centers":

Minimal number of clusters to represent a class in the training data.

Possible values: '1'"1""1""1""1""1", '2'"2""2""2""2""2"

Default value: '1'"1""1""1""1""1"

'max_center'"max_center""max_center""max_center""max_center""max_center":

Maximal number of clusters to represent a class in the training data.

Possible values: '1'"1""1""1""1""1", '5'"5""5""5""5""5", '10'"10""10""10""10""10"

Default value: '1'"1""1""1""1""1"

'covar_type'"covar_type""covar_type""covar_type""covar_type""covar_type":

Type of the covariance to represent the size of a cluster.

Possible values: 'spherical'"spherical""spherical""spherical""spherical""spherical", 'diag'"diag""diag""diag""diag""diag", 'full'"full""full""full""full""full"

Default value: 'spherical'"spherical""spherical""spherical""spherical""spherical"

'random_seed'"random_seed""random_seed""random_seed""random_seed""random_seed":

Random seed.

Default value: '42'"42""42""42""42""42"

'threshold'"threshold""threshold""threshold""threshold""threshold":

Training threshold.

Default value: ' 0.001'" 0.001"" 0.001"" 0.001"" 0.001"" 0.001"

'regularize'"regularize""regularize""regularize""regularize""regularize":

Regularization value.

Default value: ' 0.0001'" 0.0001"" 0.0001"" 0.0001"" 0.0001"" 0.0001"

'randomize'"randomize""randomize""randomize""randomize""randomize":

Randomize the input vector.

Default value: '0'"0""0""0""0""0"

'class_priors'"class_priors""class_priors""class_priors""class_priors""class_priors":

Mode to determine the a-priori probabilities of the classes.

Possible values: 'training'"training""training""training""training""training", 'uniform'"uniform""uniform""uniform""uniform""uniform"

Default value: 'training'"training""training""training""training""training"

A more exact description of those parameters can be found in create_class_gmmcreate_class_gmmCreateClassGmmcreate_class_gmmCreateClassGmmCreateClassGmm and train_class_gmmtrain_class_gmmTrainClassGmmtrain_class_gmmTrainClassGmmTrainClassGmm.

Attention

This operator may take considerable time, depending on the size of the data set in the training file, and the number of features.

Please note, that this operator should not be called, if only a small set of training data is available. Due to the risk of overfitting the operator select_feature_set_gmmselect_feature_set_gmmSelectFeatureSetGmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmm may deliver a classifier with a very high score. However, the classifier may perfom poorly when tested.

Parallelization

Multithreading type: exclusive (runs in parallel only with independent operators).
Multithreading scope: global (may be called from any thread).
Automatically parallelized on internal data level.

Parameters

ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle (input_control) class_train_data → (integer)

Handle of the training data.

SelectionMethodSelectionMethodSelectionMethodSelectionMethodSelectionMethodselectionMethod (input_control) string → (string)

Method to perform the selection.

Default value: 'greedy' "greedy" "greedy" "greedy" "greedy" "greedy"

List of values: 'greedy'"greedy""greedy""greedy""greedy""greedy", 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating"

GenParamNamesGenParamNamesGenParamNamesGenParamNamesGenParamNamesgenParamNames (input_control) string(-array) → (string)

Names of generic parameters to configure the classifier.

Default value: []

List of values: 'class_priors'"class_priors""class_priors""class_priors""class_priors""class_priors", 'covar_type'"covar_type""covar_type""covar_type""covar_type""covar_type", 'max_center'"max_center""max_center""max_center""max_center""max_center", 'min_centers'"min_centers""min_centers""min_centers""min_centers""min_centers", 'random_seed'"random_seed""random_seed""random_seed""random_seed""random_seed", 'randomize'"randomize""randomize""randomize""randomize""randomize", 'regularize'"regularize""regularize""regularize""regularize""regularize", 'threshold'"threshold""threshold""threshold""threshold""threshold"

GenParamValuesGenParamValuesGenParamValuesGenParamValuesGenParamValuesgenParamValues (input_control) number(-array) → (real / integer / string)

Values of generic parameters to configure the classifier.

Default value: []

Suggested values: 1, 2, 3, 'spherical'"spherical""spherical""spherical""spherical""spherical", 'diag'"diag""diag""diag""diag""diag", 'full'"full""full""full""full""full", 42, 0.001, 0.0001, 0

GMMHandleGMMHandleGMMHandleGMMHandleGMMHandleGMMHandle (output_control) class_gmm → (integer)

A trained GMM classifier using only the selected features.

SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices (output_control) string-array → (string)

The selected feature set, contains indices or names.

ScoreScoreScoreScoreScorescore (output_control) real-array → (real)

The achieved score using two-fold cross-validation.

Example (HDevelop)

* Find out which of the two features distinguishes two Classes
NameFeature1 := 'Good Feature'
NameFeature2 := 'Bad Feature'
LengthFeature1 := 3
LengthFeature2 := 2
* Create training data
create_class_train_data (LengthFeature1+LengthFeature2,\
  ClassTrainDataHandle)
* Define the features which are in the training data
set_feature_lengths_class_train_data (ClassTrainDataHandle, [LengthFeature1,\
  LengthFeature2], [NameFeature1, NameFeature2])
* Add training data
*                                                         |Feat1| |Feat2|
add_sample_class_train_data (ClassTrainDataHandle, 'row', [1,1,1,  2,1  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,2,2,  2,1  ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [1,1,1,  3,4  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,2,2,  3,4  ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [0,0,1,  5,6  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,3,2,  5,6  ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [0,0,1,  5,6  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,3,2,  5,6  ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [0,0,1,  5,6  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,3,2,  5,6  ], 1)
* Add more data 
* ...
* Select the better feature with a GMM
select_feature_set_gmm (ClassTrainDataHandle, 'greedy', [], [], GMMHandle,\
  SelectedFeatureGMM, Score)
clear_class_train_data (ClassTrainDataHandle)
* Use the classifier
* ...
clear_class_gmm (GMMHandle)

Result

If the parameters are valid, the operator select_feature_set_gmmselect_feature_set_gmmSelectFeatureSetGmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmm returns the value 2 (H_MSG_TRUE). If necessary, an exception is raised.

Possible Predecessors

create_class_train_datacreate_class_train_dataCreateClassTrainDatacreate_class_train_dataCreateClassTrainDataCreateClassTrainData, add_sample_class_train_dataadd_sample_class_train_dataAddSampleClassTrainDataadd_sample_class_train_dataAddSampleClassTrainDataAddSampleClassTrainData, set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData

Possible Successors

classify_class_gmmclassify_class_gmmClassifyClassGmmclassify_class_gmmClassifyClassGmmClassifyClassGmm

Alternatives

select_feature_set_mlpselect_feature_set_mlpSelectFeatureSetMlpselect_feature_set_mlpSelectFeatureSetMlpSelectFeatureSetMlp, select_feature_set_knnselect_feature_set_knnSelectFeatureSetKnnselect_feature_set_knnSelectFeatureSetKnnSelectFeatureSetKnn, select_feature_set_svmselect_feature_set_svmSelectFeatureSetSvmselect_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvm

Module

Foundation

Operators