get_prep_info_ocr_class_mlpT_get_prep_info_ocr_class_mlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpget_prep_info_ocr_class_mlp (Operator)

Name

get_prep_info_ocr_class_mlpT_get_prep_info_ocr_class_mlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpget_prep_info_ocr_class_mlp — Compute the information content of the preprocessed feature vectors of an OCR classifier.

Signature

get_prep_info_ocr_class_mlp( : : OCRHandle, TrainingFile, Preprocessing : InformationCont, CumInformationCont)

Description

get_prep_info_ocr_class_mlpget_prep_info_ocr_class_mlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpget_prep_info_ocr_class_mlp computes the information content of the training vectors that have been transformed with the preprocessing given by PreprocessingPreprocessingPreprocessingPreprocessingpreprocessingpreprocessing. PreprocessingPreprocessingPreprocessingPreprocessingpreprocessingpreprocessing can be set to 'principal_components'"principal_components""principal_components""principal_components""principal_components""principal_components" or 'canonical_variates'"canonical_variates""canonical_variates""canonical_variates""canonical_variates""canonical_variates". The OCR classifier OCRHandleOCRHandleOCRHandleOCRHandleOCRHandleocrhandle must have been created with create_ocr_class_mlpcreate_ocr_class_mlpCreateOcrClassMlpCreateOcrClassMlpCreateOcrClassMlpcreate_ocr_class_mlp. The preprocessing methods are described with create_class_mlpcreate_class_mlpCreateClassMlpCreateClassMlpCreateClassMlpcreate_class_mlp. The information content is derived from the variations of the transformed components of the feature vector, i.e., it is computed solely based on the training data, independent of any error rate on the training data. The information content is computed for all relevant components of the transformed feature vectors (NumInputNumInputNumInputNumInputnumInputnum_input for 'principal_components'"principal_components""principal_components""principal_components""principal_components""principal_components" and min(NumOutputNumOutputNumOutputNumOutputnumOutputnum_output - 1, NumInputNumInputNumInputNumInputnumInputnum_input) for 'canonical_variates'"canonical_variates""canonical_variates""canonical_variates""canonical_variates""canonical_variates", see create_class_mlpcreate_class_mlpCreateClassMlpCreateClassMlpCreateClassMlpcreate_class_mlp), and is returned in InformationContInformationContInformationContInformationContinformationContinformation_cont as a number between 0 and 1. To convert the information content into a percentage, it simply needs to be multiplied by 100. The cumulative information content of the first n components is returned in the n-th component of CumInformationContCumInformationContCumInformationContCumInformationContcumInformationContcum_information_cont, i.e., CumInformationContCumInformationContCumInformationContCumInformationContcumInformationContcum_information_cont contains the sums of the first n elements of InformationContInformationContInformationContInformationContinformationContinformation_cont. To use get_prep_info_ocr_class_mlpget_prep_info_ocr_class_mlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpget_prep_info_ocr_class_mlp, a sufficient number of samples must be stored in the training files given by TrainingFileTrainingFileTrainingFileTrainingFiletrainingFiletraining_file (see write_ocr_trainfwrite_ocr_trainfWriteOcrTrainfWriteOcrTrainfWriteOcrTrainfwrite_ocr_trainf).

InformationContInformationContInformationContInformationContinformationContinformation_cont and CumInformationContCumInformationContCumInformationContCumInformationContcumInformationContcum_information_cont can be used to decide how many components of the transformed feature vectors contain relevant information. An often used criterion is to require that the transformed data must represent x% (e.g., 90%) of the total data. This can be decided easily from the first value of CumInformationContCumInformationContCumInformationContCumInformationContcumInformationContcum_information_cont that lies above x%. The number thus obtained can be used as the value for NumComponentsNumComponentsNumComponentsNumComponentsnumComponentsnum_components in a new call to create_ocr_class_mlpcreate_ocr_class_mlpCreateOcrClassMlpCreateOcrClassMlpCreateOcrClassMlpcreate_ocr_class_mlp. The call to get_prep_info_ocr_class_mlpget_prep_info_ocr_class_mlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpget_prep_info_ocr_class_mlp already requires the creation of a classifier, and hence the setting of NumComponentsNumComponentsNumComponentsNumComponentsnumComponentsnum_components in create_ocr_class_mlpcreate_ocr_class_mlpCreateOcrClassMlpCreateOcrClassMlpCreateOcrClassMlpcreate_ocr_class_mlp to an initial value. However, if get_prep_info_ocr_class_mlpget_prep_info_ocr_class_mlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpget_prep_info_ocr_class_mlp is called it is typically not known how many components are relevant, and hence how to set NumComponentsNumComponentsNumComponentsNumComponentsnumComponentsnum_components in this call. Therefore, the following two-step approach should typically be used to select NumComponentsNumComponentsNumComponentsNumComponentsnumComponentsnum_components: In a first step, a classifier with the maximum number for NumComponentsNumComponentsNumComponentsNumComponentsnumComponentsnum_components is created (NumInputNumInputNumInputNumInputnumInputnum_input for 'principal_components'"principal_components""principal_components""principal_components""principal_components""principal_components" and min(NumOutputNumOutputNumOutputNumOutputnumOutputnum_output - 1, NumInputNumInputNumInputNumInputnumInputnum_input) for 'canonical_variates'"canonical_variates""canonical_variates""canonical_variates""canonical_variates""canonical_variates"). Then, the training samples are saved in a training file using write_ocr_trainfwrite_ocr_trainfWriteOcrTrainfWriteOcrTrainfWriteOcrTrainfwrite_ocr_trainf. Subsequently, get_prep_info_ocr_class_mlpget_prep_info_ocr_class_mlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpget_prep_info_ocr_class_mlp is used to determine the information content of the components, and with this NumComponentsNumComponentsNumComponentsNumComponentsnumComponentsnum_components. After this, a new classifier with the desired number of components is created, and the classifier is trained with trainf_ocr_class_mlptrainf_ocr_class_mlpTrainfOcrClassMlpTrainfOcrClassMlpTrainfOcrClassMlptrainf_ocr_class_mlp.

Execution Information

Multithreading type: reentrant (runs in parallel with non-exclusive operators).
Multithreading scope: global (may be called from any thread).
Processed without parallelization.

Parameters

OCRHandleOCRHandleOCRHandleOCRHandleOCRHandleocrhandle (input_control) ocr_mlp → (handle)

Handle of the OCR classifier.

TrainingFileTrainingFileTrainingFileTrainingFiletrainingFiletraining_file (input_control) filename.read(-array) → (string)

Names of the training files.

Default value: 'ocr.trf' "ocr.trf" "ocr.trf" "ocr.trf" "ocr.trf" "ocr.trf"

File extension: .trf, .otr

PreprocessingPreprocessingPreprocessingPreprocessingpreprocessingpreprocessing (input_control) string → (string)

Type of preprocessing used to transform the feature vectors.

Default value: 'principal_components' "principal_components" "principal_components" "principal_components" "principal_components" "principal_components"

List of values: 'canonical_variates'"canonical_variates""canonical_variates""canonical_variates""canonical_variates""canonical_variates", 'principal_components'"principal_components""principal_components""principal_components""principal_components""principal_components"

InformationContInformationContInformationContInformationContinformationContinformation_cont (output_control) real-array → (real)

Relative information content of the transformed feature vectors.

CumInformationContCumInformationContCumInformationContCumInformationContcumInformationContcum_information_cont (output_control) real-array → (real)

Cumulative information content of the transformed feature vectors.

Example (HDevelop)

* Create the initial OCR classifier.
read_ocr_trainf_names ('ocr.trf', CharacterNames, CharacterCount)
create_ocr_class_mlp (8, 10, 'constant', 'default', CharacterNames, 80, \
                      'canonical_variates', |CharacterNames|, 42, OCRHandle)
* Get the information content of the transformed feature vectors.
get_prep_info_ocr_class_mlp (OCRHandle, 'ocr.trf', 'canonical_variates', \
                             InformationCont, CumInformationCont)
* Determine the number of transformed components.
* NumComp = [...]
* Create the final OCR classifier.
create_ocr_class_mlp (8, 10, 'constant', 'default', CharacterNames, 80, \
                      'canonical_variates', NumComp, 42, OCRHandle)
* Train the final classifier.
trainf_ocr_class_mlp (OCRHandle, 'ocr.trf', 100, 1, 0.01, Error, ErrorLog)
write_ocr_class_mlp (OCRHandle, 'ocr.omc')

Result

If the parameters are valid, the operator get_prep_info_ocr_class_mlpget_prep_info_ocr_class_mlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpget_prep_info_ocr_class_mlp returns the value TRUE. If necessary, an exception is raised.

get_prep_info_ocr_class_mlpget_prep_info_ocr_class_mlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpGetPrepInfoOcrClassMlpget_prep_info_ocr_class_mlp may return the error 9211 (Matrix is not positive definite) if PreprocessingPreprocessingPreprocessingPreprocessingpreprocessingpreprocessing = 'canonical_variates'"canonical_variates""canonical_variates""canonical_variates""canonical_variates""canonical_variates" is used. This typically indicates that not enough training samples have been stored for each class.

Possible Predecessors

create_ocr_class_mlpcreate_ocr_class_mlpCreateOcrClassMlpCreateOcrClassMlpCreateOcrClassMlpcreate_ocr_class_mlp, write_ocr_trainfwrite_ocr_trainfWriteOcrTrainfWriteOcrTrainfWriteOcrTrainfwrite_ocr_trainf, append_ocr_trainfappend_ocr_trainfAppendOcrTrainfAppendOcrTrainfAppendOcrTrainfappend_ocr_trainf, write_ocr_trainf_imagewrite_ocr_trainf_imageWriteOcrTrainfImageWriteOcrTrainfImageWriteOcrTrainfImagewrite_ocr_trainf_image

Possible Successors

clear_ocr_class_mlpclear_ocr_class_mlpClearOcrClassMlpClearOcrClassMlpClearOcrClassMlpclear_ocr_class_mlp, create_ocr_class_mlpcreate_ocr_class_mlpCreateOcrClassMlpCreateOcrClassMlpCreateOcrClassMlpcreate_ocr_class_mlp

Module

OCR/OCV

Operators