select_feature_set_trainf_mlp — Selects an optimal combination of features to classify OCR data.
select_feature_set_trainf_mlp selects an optimal combination of features, to classify the OCR data given in the training file TrainingFile with a multilayer perceptron, for details see create_ocr_class_mlp.
Possible features are all OCR features listed and explained in create_ocr_class_mlp. All candidates which should be tested can be specified in FeatureList. A subset of these features is returned as selected features in FeatureSet.
select_feature_set_trainf_mlp is specialized on OCR problems and only supports the features in the list mentioned before. In order to use other features, please use the more general operator select_feature_set_mlp.
The selection method SelectionMethod is either a greedy search 'greedy' (iteratively add the feature with highest gain) or the dynamically oscillating search 'greedy_oscillating' (add the feature with highest gain and test then if any of the already added features can be left out without great loss). The method 'greedy' is generally preferable, since it is faster. Only in cases when a large training set is available the method 'greedy_oscillating' might return better results.
The optimization criterion is the classification rate of a two-fold cross-validation of the training data. The best achieved value is returned in Score.
The parameters GenParamNames and GenParamValues allow to adapt the setting of the number of hidden neurons in the MLP with 'num_hidden'. The default value is 80, a higher value leads to longer training times but might lead to a more expressive classifier.
This operator may take considerable time, depending on the size of the data set in the training file, and the number of features.
Please note, that this operator should not be called, if only a small set of training data is available. Due to the risk of overfitting the operator select_feature_set_trainf_mlp may deliver a classifier with a very high score. However, the classifier may perfom poorly when tested.
Names of the training files.
Default value: ''
File extension: .trf, .otr
List of features that should be considered for selection.
Default value: ['zoom_factor','ratio','width','height','foreground','foreground_grid_9','foreground_grid_16','anisometry','compactness','convexity','moments_region_2nd_invar','moments_region_2nd_rel_invar','moments_region_3rd_invar','moments_central','phi','num_connect','num_holes','projection_horizontal','projection_vertical','projection_horizontal_invar','projection_vertical_invar','chord_histo','num_runs','pixel','pixel_invar','pixel_binary','gradient_8dir','cooc','moments_gray_plane']
List of values: 'anisometry', 'chord_histo', 'compactness', 'convexity', 'cooc', 'default', 'foreground', 'foreground_grid_16', 'foreground_grid_9', 'gradient_8dir', 'height', 'moments_central', 'moments_gray_plane', 'moments_region_2nd_invar', 'moments_region_2nd_rel_invar', 'moments_region_3rd_invar', 'num_connect', 'num_holes', 'num_runs', 'phi', 'pixel', 'pixel_binary', 'pixel_invar', 'projection_horizontal', 'projection_horizontal_invar', 'projection_vertical', 'projection_vertical_invar', 'ratio', 'width', 'zoom_factor'
Method to perform the selection.
Default value: 'greedy'
List of values: 'greedy', 'greedy_oscillating'
Width of the rectangle to which the gray values of the segmented character are zoomed.
Default value: 15
Height of the rectangle to which the gray values of the segmented character are zoomed.
Default value: 16
Names of generic parameters to configure the selection process and the classifier.
Default value: 
List of values: 'nu'
Values of generic parameters to configure the selection process and the classifier.
Default value: 
Suggested values: '0.1'
Trained OCR-MLP classifier.
Selected feature set, contains only entries from FeatureList.
Achieved score using tow-fold cross-validation.
If the parameters are valid, the operator select_feature_set_trainf_mlp returns the value 2 (H_MSG_TRUE). If necessary, an exception is raised.