ClassesClassesClassesClasses | | | | Operators

select_feature_set_knnT_select_feature_set_knnSelectFeatureSetKnnselect_feature_set_knnSelectFeatureSetKnnSelectFeatureSetKnn (Operator)

Name

select_feature_set_knnT_select_feature_set_knnSelectFeatureSetKnnselect_feature_set_knnSelectFeatureSetKnnSelectFeatureSetKnn — Selects an optimal subset from a set of features to solve a certain classification problem.

Signature

select_feature_set_knn( : : ClassTrainDataHandle, SelectionMethod, GenParamNames, GenParamValues : KNNHandle, SelectedFeatureIndices, Score)

Herror T_select_feature_set_knn(const Htuple ClassTrainDataHandle, const Htuple SelectionMethod, const Htuple GenParamNames, const Htuple GenParamValues, Htuple* KNNHandle, Htuple* SelectedFeatureIndices, Htuple* Score)

Herror select_feature_set_knn(const HTuple& ClassTrainDataHandle, const HTuple& SelectionMethod, const HTuple& GenParamNames, const HTuple& GenParamValues, HTuple* KNNHandle, HTuple* SelectedFeatureIndices, HTuple* Score)

HTuple HClassKnn::SelectFeatureSetKnn(const HClassTrainData& ClassTrainDataHandle, const HTuple& SelectionMethod, const HTuple& GenParamNames, const HTuple& GenParamValues, HTuple* Score)

HClassKnn HClassTrainData::SelectFeatureSetKnn(const HTuple& SelectionMethod, const HTuple& GenParamNames, const HTuple& GenParamValues, HTuple* SelectedFeatureIndices, HTuple* Score) const

void SelectFeatureSetKnn(const HTuple& ClassTrainDataHandle, const HTuple& SelectionMethod, const HTuple& GenParamNames, const HTuple& GenParamValues, HTuple* KNNHandle, HTuple* SelectedFeatureIndices, HTuple* Score)

HTuple HClassKnn::SelectFeatureSetKnn(const HClassTrainData& ClassTrainDataHandle, const HString& SelectionMethod, const HTuple& GenParamNames, const HTuple& GenParamValues, HTuple* Score)

HTuple HClassKnn::SelectFeatureSetKnn(const HClassTrainData& ClassTrainDataHandle, const HString& SelectionMethod, const HString& GenParamNames, double GenParamValues, HTuple* Score)

HTuple HClassKnn::SelectFeatureSetKnn(const HClassTrainData& ClassTrainDataHandle, const char* SelectionMethod, const char* GenParamNames, double GenParamValues, HTuple* Score)

HClassKnn HClassTrainData::SelectFeatureSetKnn(const HString& SelectionMethod, const HTuple& GenParamNames, const HTuple& GenParamValues, HTuple* SelectedFeatureIndices, HTuple* Score) const

HClassKnn HClassTrainData::SelectFeatureSetKnn(const HString& SelectionMethod, const HString& GenParamNames, double GenParamValues, HTuple* SelectedFeatureIndices, HTuple* Score) const

HClassKnn HClassTrainData::SelectFeatureSetKnn(const char* SelectionMethod, const char* GenParamNames, double GenParamValues, HTuple* SelectedFeatureIndices, HTuple* Score) const

void HOperatorSetX.SelectFeatureSetKnn(
[in] VARIANT ClassTrainDataHandle, [in] VARIANT SelectionMethod, [in] VARIANT GenParamNames, [in] VARIANT GenParamValues, [out] VARIANT* KNNHandle, [out] VARIANT* SelectedFeatureIndices, [out] VARIANT* Score)

VARIANT HClassKnnX.SelectFeatureSetKnn(
[in] IHClassTrainDataX* ClassTrainDataHandle, [in] BSTR SelectionMethod, [in] VARIANT GenParamNames, [in] VARIANT GenParamValues, [out] VARIANT* Score)

IHClassKnnX* HClassTrainDataX.SelectFeatureSetKnn(
[in] BSTR SelectionMethod, [in] VARIANT GenParamNames, [in] VARIANT GenParamValues, [out] VARIANT* SelectedFeatureIndices, [out] VARIANT* Score)

static void HOperatorSet.SelectFeatureSetKnn(HTuple classTrainDataHandle, HTuple selectionMethod, HTuple genParamNames, HTuple genParamValues, out HTuple KNNHandle, out HTuple selectedFeatureIndices, out HTuple score)

HTuple HClassKnn.SelectFeatureSetKnn(HClassTrainData classTrainDataHandle, string selectionMethod, HTuple genParamNames, HTuple genParamValues, out HTuple score)

HTuple HClassKnn.SelectFeatureSetKnn(HClassTrainData classTrainDataHandle, string selectionMethod, string genParamNames, double genParamValues, out HTuple score)

HClassKnn HClassTrainData.SelectFeatureSetKnn(string selectionMethod, HTuple genParamNames, HTuple genParamValues, out HTuple selectedFeatureIndices, out HTuple score)

HClassKnn HClassTrainData.SelectFeatureSetKnn(string selectionMethod, string genParamNames, double genParamValues, out HTuple selectedFeatureIndices, out HTuple score)

Description

select_feature_set_knnselect_feature_set_knnSelectFeatureSetKnnselect_feature_set_knnSelectFeatureSetKnnSelectFeatureSetKnn selects an optimal subset from a set of features to solve a certain classification problem. The classification problem has to be specified with annotated training data in ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle and will be classified by a a k-nearest neighbors classifier. Details of the properties of this classifier can be found in create_class_knncreate_class_knnCreateClassKnncreate_class_knnCreateClassKnnCreateClassKnn.

The result of the operator is a trained classifier that is returned in KNNHandleKNNHandleKNNHandleKNNHandleKNNHandleKNNHandle. Additionally, the list of indices or names of the selected features is returned in SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices. To use this classifier, calculate for new input data all features mentioned in SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices and pass them to the classifier.

A possible application of this operator can be a comparison of different parameter sets for certain feature extraction techniques. Another application is to search for a property that is discriminating between different classes of parts or classes of errors.

To define the features that should be selected from ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle, the dimensions of the feature vectors in ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle can be grouped into subfeatures by calling set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData. A subfeature can contain several subsequent elements of a feature vector. The operator decides for each of these subfeatures, if it is better to use it for the classification or leave it out.

The indices of the selected subfeatures are returned in SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices. If names were set in set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData, these names are returned instead of the indices. If set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData was not called for ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle before, each element of the feature vector is considered as a subfeature.

The selection method SelectionMethodSelectionMethodSelectionMethodSelectionMethodSelectionMethodselectionMethod is either a greedy search 'greedy'"greedy""greedy""greedy""greedy""greedy" (iteratively add the feature with highest gain) or the dynamically oscillating search 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating" (add the feature with highest gain and test then if any of the already added features can be left out without great loss). The method 'greedy'"greedy""greedy""greedy""greedy""greedy" is generally preferable, since it is faster. Only in cases when the subfeatures are low-dimensional or redundant, the method 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating" should be chosen.

The optimization criterion is the classification rate of a two-fold cross-validation of the training data. The best achieved value is returned in ScoreScoreScoreScoreScorescore.

The k-NN classifier can be parameterized using the following values in GenParamNamesGenParamNamesGenParamNamesGenParamNamesGenParamNamesgenParamNames and GenParamValuesGenParamValuesGenParamValuesGenParamValuesGenParamValuesgenParamValues:

'num_neighbors'"num_neighbors""num_neighbors""num_neighbors""num_neighbors""num_neighbors":

The number of minimally evaluated nodes, increase this value for high dimensional data.

Possible values: '1'"1""1""1""1""1", '2'"2""2""2""2""2", '5'"5""5""5""5""5", '10'"10""10""10""10""10"

Default value: '1'"1""1""1""1""1"

'num_trees'"num_trees""num_trees""num_trees""num_trees""num_trees":

Number of search trees in the k-NN classifier

Possible values: '1'"1""1""1""1""1", '4'"4""4""4""4""4", '10'"10""10""10""10""10"

Default value: '4'"4""4""4""4""4"

Attention

This operator may take considerable time, depending on the size of the data set in the training file, and the number of features.

Please note, that this operator should not be called, if only a small set of training data is available. Due to the risk of overfitting the operator select_feature_set_knnselect_feature_set_knnSelectFeatureSetKnnselect_feature_set_knnSelectFeatureSetKnnSelectFeatureSetKnn may deliver a classifier with a very high score. However, the classifier may perfom poorly when tested.

Parallelization

This operator returns a handle. Note that the state of an instance of this handle type may be changed by specific operators even though the handle is used as an input parameter by those operators.

Parameters

ClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleClassTrainDataHandleclassTrainDataHandle (input_control)  class_train_data HClassTrainData, HTupleHTupleHClassTrainData, HTupleHClassTrainDataX, VARIANTHtuple (integer) (IntPtr) (Hlong) (Hlong) (Hlong) (Hlong)

Handle of the training data.

SelectionMethodSelectionMethodSelectionMethodSelectionMethodSelectionMethodselectionMethod (input_control)  string HTupleHTupleHTupleVARIANTHtuple (string) (string) (HString) (char*) (BSTR) (char*)

Method to perform the selection.

Default value: 'greedy' "greedy" "greedy" "greedy" "greedy" "greedy"

List of values: 'greedy'"greedy""greedy""greedy""greedy""greedy", 'greedy_oscillating'"greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating""greedy_oscillating"

GenParamNamesGenParamNamesGenParamNamesGenParamNamesGenParamNamesgenParamNames (input_control)  string(-array) HTupleHTupleHTupleVARIANTHtuple (string) (string) (HString) (char*) (BSTR) (char*)

Names of generic parameters to configure the selection process and the classifier.

Default value: []

List of values: 'num_neighbors'"num_neighbors""num_neighbors""num_neighbors""num_neighbors""num_neighbors", 'num_trees'"num_trees""num_trees""num_trees""num_trees""num_trees"

GenParamValuesGenParamValuesGenParamValuesGenParamValuesGenParamValuesgenParamValues (input_control)  number(-array) HTupleHTupleHTupleVARIANTHtuple (real / integer / string) (double / int / long / string) (double / Hlong / HString) (double / Hlong / char*) (double / Hlong / BSTR) (double / Hlong / char*)

Values of generic parameters to configure the selection process and the classifier.

Default value: []

Suggested values: 1, 2, 3

KNNHandleKNNHandleKNNHandleKNNHandleKNNHandleKNNHandle (output_control)  class_knn HClassKnn, HTupleHTupleHClassKnn, HTupleHClassKnnX, VARIANTHtuple (integer) (IntPtr) (Hlong) (Hlong) (Hlong) (Hlong)

A trained k-NN classifier using only the selected features.

SelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesSelectedFeatureIndicesselectedFeatureIndices (output_control)  string-array HTupleHTupleHTupleVARIANTHtuple (string) (string) (HString) (char*) (BSTR) (char*)

The selected feature set, contains indices or names.

ScoreScoreScoreScoreScorescore (output_control)  real-array HTupleHTupleHTupleVARIANTHtuple (real) (double) (double) (double) (double) (double)

The achieved score using two-fold cross-validation.

Example (HDevelop)

* Find out which of the two features distinguishes two Classes
NameFeature1 := 'Good Feature'
NameFeature2 := 'Bad Feature'
LengthFeature1 := 3
LengthFeature2 := 2
* Create training data
create_class_train_data (LengthFeature1+LengthFeature2,\
  ClassTrainDataHandle)
* Define the features which are in the training data
set_feature_lengths_class_train_data (ClassTrainDataHandle, [LengthFeature1,\
  LengthFeature2], [NameFeature1, NameFeature2])
* Add training data
*                                                         |Feat1| |Feat2|
add_sample_class_train_data (ClassTrainDataHandle, 'row', [1,1,1,  2,1  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,2,2,  2,1  ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [1,1,1,  3,4  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,2,2,  3,4  ], 1)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [0,0,1,  5,6  ], 0)
add_sample_class_train_data (ClassTrainDataHandle, 'row', [2,3,2,  5,6  ], 1)
* Add more data 
* ...
* Select the better feature with the k-NN classifier 
select_feature_set_knn (ClassTrainDataHandle, 'greedy', [], [], KNNHandle,\
  SelectedFeatureKNN, Score)
clear_class_train_data (ClassTrainDataHandle)
* Use the classifier
* ...
clear_class_knn (KNNHandle)

Result

If the parameters are valid, the operator select_feature_set_knnselect_feature_set_knnSelectFeatureSetKnnselect_feature_set_knnSelectFeatureSetKnnSelectFeatureSetKnn returns the value 2 (H_MSG_TRUE). If necessary, an exception is raised.

Possible Predecessors

create_class_train_datacreate_class_train_dataCreateClassTrainDatacreate_class_train_dataCreateClassTrainDataCreateClassTrainData, add_sample_class_train_dataadd_sample_class_train_dataAddSampleClassTrainDataadd_sample_class_train_dataAddSampleClassTrainDataAddSampleClassTrainData, set_feature_lengths_class_train_dataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataset_feature_lengths_class_train_dataSetFeatureLengthsClassTrainDataSetFeatureLengthsClassTrainData

Possible Successors

classify_class_knnclassify_class_knnClassifyClassKnnclassify_class_knnClassifyClassKnnClassifyClassKnn

Alternatives

select_feature_set_mlpselect_feature_set_mlpSelectFeatureSetMlpselect_feature_set_mlpSelectFeatureSetMlpSelectFeatureSetMlp, select_feature_set_svmselect_feature_set_svmSelectFeatureSetSvmselect_feature_set_svmSelectFeatureSetSvmSelectFeatureSetSvm, select_feature_set_gmmselect_feature_set_gmmSelectFeatureSetGmmselect_feature_set_gmmSelectFeatureSetGmmSelectFeatureSetGmm

See also

select_feature_set_trainf_knnselect_feature_set_trainf_knnSelectFeatureSetTrainfKnnselect_feature_set_trainf_knnSelectFeatureSetTrainfKnnSelectFeatureSetTrainfKnn, gray_featuresgray_featuresGrayFeaturesgray_featuresGrayFeaturesGrayFeatures, region_featuresregion_featuresRegionFeaturesregion_featuresRegionFeaturesRegionFeatures

Module

Foundation


ClassesClassesClassesClasses | | | | Operators