set_text_model_param
— Set parameters of a text model.
set_text_model_param( : : TextModel, GenParamName, GenParamValue : )
set_text_model_param
sets parameters of a text model. The list of
allowed parameter values for GenParamName
differs, depending on
which Mode
was set when creating the text model with
create_text_model_reader
.
In the following, first the parameter values for text models with
Mode
= 'auto' are listed, and then those
for text models with Mode
= 'manual' .
The name and value of a parameter must be given in GenParamName
and GenParamValue
. The following values are possible:
Parameters of text models with Mode = 'auto'
Segmentation behavior
The minimal contrast the characters have to their surrounding background.
List of values: integer or float value between 1 and 255 for byte images and between 1 and 65.535 for uint2 images
Default value: 15
'dark_on_light' if the text to be segmented is darker than its background, 'light_on_dark' if the text to be segmented is lighter than its background, and 'both' if both kinds of text are to be segmented.
List of values: 'dark_on_light' , 'light_on_dark' , 'both'
Default value: 'both'
'true' if regions that are touching the border of the image domain should be discarded, otherwise 'false' .
List of values: 'true' ,'false'
Default value: 'false'
'true' if fragments, such as the dot on the 'i', should be added to the segmented characters, otherwise 'false' . Be aware, that this can cause noise to be added to the segmented characters.
List of values: 'true' ,'false'
Default value: 'true'
Controls the handling of pairs or small groups of neighboring characters that are segmented as one single region. When selecting 'standard' or 'enhanced' , such regions are detected and separated into two or more single characters. While the 'enhanced' method yields more accurate results, the 'standard' method is less complex and thus faster. If 'separate_touching_chars' is set to 'false' , no separation of touching characters is performed.
Remark: If 'enhanced' is selected, the file find_text_support.hotc from the ocr subdirectory of the root directory of the HALCON installation is needed. It is also possible to place this file in the current working directory.
List of values: 'false' , 'standard' , 'enhanced'
Default value: 'standard'
Character size
The minimal height of the characters in pixel. If text of arbitrary height is to be segmented, 'auto' may be passed. Note that 'min_char_height' refers to characters only. The height of punctuation marks or separators is not restricted by 'min_char_height' .
List of values: integer or float value greater or equal to 1
Default value: 'auto'
The maximal height of the characters in pixel. If text of arbitrary height is to be segmented, 'auto' may be passed. Note that 'max_char_height' refers to characters only. The height of punctuation marks or separators is not restricted by 'max_char_height' .
List of values: integer or float value greater or equal to 1
Default value: 'auto'
The minimal width of the characters in pixel. If text of arbitrary width is to be segmented, 'auto' may be passed. Note that 'min_char_width' refers to characters only. The width of punctuation marks or separators is not restricted by 'min_char_width' .
List of values: integer or float value greater or equal to 1
Default value: 'auto'
The maximal width of the characters in pixel. If text of arbitrary width is to be segmented, 'auto' may be passed. Note that 'max_char_width' refers to characters only. The width of punctuation marks or separators is not restricted by 'max_char_width' .
List of values: integer or float value greater or equal to 1
Default value: 'auto'
The minimal stroke width of the characters in pixel. If the minimal stroke width is to be estimated within the text segmentation process automatically, 'auto' may be passed. Note that 'min_stroke_width' refers to characters only. The stroke width of punctuation marks or separators is not restricted by 'min_stroke_width' .
List of values: integer or float value greater or equal to 1
Default value: 'auto'
The maximal stroke width of the characters in pixel. If the maximal stroke width is to be estimated within the text segmentation process automatically, 'auto' may be passed. Note that 'max_stroke_width' refers to characters only. The stroke width of punctuation marks or separators is not restricted by 'max_stroke_width' .
List of values: integer or float value greater or equal to 1
Default value: 'auto'
Special characters
'true' if small punctuation marks that lie close to the base line of the corresponding text line (e.g., dots or commas) are to be returned. 'false' if no such punctuations should be returned.
List of values: 'true' ,'false'
Default value: 'true'
'true' if separators such as a minus or the equality sign should be returned as well. 'false' if no separators should be returned.
List of values: 'true' ,'false'
Default value: 'true'
Handling of dot prints
'true' if the text to be segmented contains dot printed characters, otherwise 'false' .
List of values: 'true' ,'false'
Default value: 'false'
'true' if the gap between adjacent characters is smaller than the largest gap between two dots within a single character, otherwise 'false' . If 'dot_print' is set to 'false' this parameter does not have any effect. In cases where the minimal gap size between characters is exactly known, 'dot_print_min_char_gap' can be set instead. In this case the value of 'dot_print_tight_char_spacing' is ignored.
List of values: 'true' ,'false'
Default value: 'false'
The minimal gap size between two characters in pixel. This parameter can be used to improve the text result in cases where the minimal gap size between characters is smaller than the maximal gap size between dots within characters. If the minimal character gap size is not known or is bigger than the maximal dot gap size, 'auto' may be passed. If 'dot_print' is set to 'false' this parameter does not have any effect. In cases where the minimal gap size between characters is not known but the characters are printed close to each other, 'dot_print_tight_char_spacing' might be used instead.
List of values: integer or float value greater or equal to 0
Default value: 'auto'
The maximal gap size between two
dots within a character in pixel. If arbitrary dot printed characters are
to be segmented, 'auto' may be passed.
If 'dot_print' is set to 'false' this parameter
does not have any effect. In cases where the maximal dot gap size is
larger than or equal to the minimal gap size between characters,
'dot_print_tight_char_spacing' or 'dot_print_min_char_gap'
should be set accordingly. Setting 'dot_print_max_dot_gap' can
reduce the runtime of find_text
significantly.
List of values: integer or float value greater or equal to 1
Default value: 'auto'
Line structures
To simplify the search for specific structures (e.g., dates or serial numbers) within the segmented text, it is possible to define text line structures. For each text line the distances between the characters are calculated, and based on these distances, the text line is divided into text blocks. Short characters such as '.', '_' and '-' are ignored in this process and treated as spaces. Furthermore, it is possible to define user specific separators which are also ignored. See the description of 'text_line_separators' for details. It is then tested if any of the user defined text line structures fit the resulting text blocks.
For example, if the text to be found is a date with two characters for month, day, and year the structure would be '2 2 2'. If the year may consist of two or four characters, the structure would be '2 2 2-4', indicating that the last character block consists of two to four characters. It is possible to provide more than one structure to match by appending an index to the parameter name, e.g., 'text_line_structure_0' , 'text_line_structure_1' . If 'text_line_structure' is set to an empty string ' ', the text to be found may have any structure.
Please observe, that every text line structure which is found, is
saved as a unique text line within the text result. Hence, when calling
get_text_object
, a 'line' then refers to a valid text line
structure. If the whole text line containing the text line structure is
to be returned instead, it is possible to set 'return_whole_line'
accordingly.
Default value: ' '
A string containing the list of characters which are to be ignored in the process of finding text line structures, see 'text_line_structure' for further details. Please note, user specific separators need to be valid characters within the used OCR classifier. For example, if ':' and '\' are to be ignored, ':\\' should be passed. Please observe, that '\' escapes any special symbol to treat it as a literal, and hence '\\' needs to be passed to use '\' as a separator.
List of values: '/' ,':' , ':\\' , '\\/:' ,...
Default value: ' '
'false' if only the segmented text line structures are to be returned as text lines. 'true' if each whole text line containing a text line structure is to be returned in text lines.
List of values: 'true' ,'false'
Default value: 'false'
OCR classifier
The OCR classifier used within
find_text
for text segmentation and classification. An initial
classifier is set when the text model is created. See
create_text_model_reader
for more information about the required
OCR Classifier.
The number of best classes to be stored
for each character (e.g., if 'num_classes' is set to 2,
find_text
returns the classification results with the highest
and second highest confidence).
If 'num_classes' exceeds the number of classes of the
classifier stored in the text model, 'num_classes' is decreased
accordingly. The actual number of classes can be queried by
get_text_result
. For classifiers with rejection class,
'num_classes' should be at least 2 in order to be able
to use the second best result if a character is classified as rejection
class.
List of values: integer greater than or equal to 1.
Default value: 2
Parameters of text models with Mode = 'manual'
Height of the characters in pixel. Refers to an uppercase character. Default value: 30px
Width of the characters in pixel. Refers to an uppercase character. Default value: 20px
Stroke width of the characters in pixel. Default value: 4.0px
Maximum base line deviation of the characters (in percent of 'manual_char_height' ). Default value: 0.15
'dark_on_light' if the text to be segmented is darker than its background, otherwise 'light_on_dark' . Default value: 'dark_on_light'
'true' if the text to be segmented contains uppercase characters or numbers only, otherwise 'false' . Default value: 'false'
'true' if the text to be segmented is a dotprint, otherwise 'false' . Default value: 'false'
'true' if the text to be segmented suffers of local changes of polarity due to reflections, otherwise 'false' . Default value: 'false'
'true' if there are longer horizontal structures close to the text to be segmented, otherwise 'false' . Default value: 'false'
'true' if regions that are touching the border of the image domain should be discarded, otherwise 'false' . Default value: 'false'
Maximum number of lines to be found.
Zero or negative values indicate no limitation. Setting
'manual_max_line_num' to a low value may strongly improve the
runtime of find_text
. Default value: no limitation
'true' if punctuation marks (e.g., dots or comma) should be added to the segmented characters. Default value: 'true'
'true' if separators such as a minus or the equality sign should be added to the segmented characters. Default value: 'true'
'true' if fragments, such as the dot on the 'i', should be added to the segmented characters. Be aware, that this can cause noise to be added to the segmented characters. Default value: 'true'
minimum area of fragment regions that are added if 'manual_add_fragments' is set to 'true' . Default value: 1
specifies the structure of the text to be found to reduce the search space and to avoid false positives. The structure is a string that contains the number of characters for every character block and spaces between these character blocks. For example, if the text to be found is a date with two characters for month, day, and year the structure would be '2 2 2'. If the year may also consist of four characters the structure would be '2 2 2-4', indicating that the last character block consists of two to four characters. It is possible to provide more than one structure to match by appending an index to the parameter name, e.g., 'manual_text_line_structure_0' , 'manual_text_line_structure_1' . If 'manual_text_line_structure' is set to an empty string ' ', the text to be found may have any structure. Default value: ' '
'true' if selected intermediate
results should be kept with the output result of find_text
.
This operator modifies the state of the following input parameter:
During execution of this operator, access to the value of this parameter must be synchronized if it is used across multiple threads.
TextModel
(input_control, state is modified) text_model →
(handle)
Text model.
GenParamName
(input_control) string(-array) →
(string)
Names of the parameters to be set.
Default: 'min_contrast'
Suggested values: 'add_fragments' , 'dot_print' , 'dot_print_max_dot_gap' , 'dot_print_min_char_gap' , 'dot_print_tight_char_spacing' , 'eliminate_border_blobs' , 'max_char_height' , 'max_char_width' , 'max_stroke_width' , 'min_char_height' , 'min_char_width' , 'min_contrast' , 'min_stroke_width' , 'num_classes' , 'ocr_classifier' , 'polarity' , 'return_punctuation' , 'return_separators' , 'return_whole_line' , 'separate_touching_chars' , 'text_line_separators' , 'text_line_structure' , 'text_line_structure_0' , 'text_line_structure_1' , 'text_line_structure_2' , 'manual_add_fragments' , 'manual_base_line_tolerance' , 'manual_char_height' , 'manual_char_width' , 'manual_eliminate_border_blobs' , 'manual_eliminate_horizontal_lines' , 'manual_fragment_size_min' , 'manual_is_dotprint' , 'manual_is_imprinted' , 'manual_max_line_num' , 'manual_persistence' , 'manual_polarity' , 'manual_return_punctuation' , 'manual_return_separators' , 'manual_stroke_width' , 'manual_text_line_structure' , 'manual_text_line_structure_0' , 'manual_text_line_structure_1' , 'manual_text_line_structure_2' , 'manual_uppercase_only'
GenParamValue
(input_control) string(-array) →
(integer / real / string)
Values of the parameters to be set.
Default: 10
Suggested values: 'true' , 'false' , 'dark_on_light' , 'light_on_dark' , 'both' , 'auto' , 'standard' , 'enhanced'
read_image (Image, 'numbers_scale') create_text_model_reader ('auto', 'Document_Rej.omc', TextModel) * Optionally specify text properties set_text_model_param (TextModel, 'min_char_height', 20) find_text (Image, TextModel, TextResultID) * Return character regions and corresponding classification results get_text_object (Characters, TextResultID, 'all_lines') get_text_result (TextResultID, 'class', Class)
If the input parameters are set correctly, the operator
set_text_model_param
returns the value 2 (
H_MSG_TRUE)
. Otherwise, an
exception will be raised.
OCR/OCV