String Operations

List of Operators ↓

This chapter contains operators for string operations.

General Information

The HALCON library encodes strings in UTF-8 by default.

UTF-8 is a Unicode character encoding that can encode all Unicode code points with one to four bytes. 'Unicode' refers to the character set that assigns each character of a string to a code point (for example, 'U + 0041' for 'A'). UTF-8 then translates the Unicode code points into binary data. The memory requirement for all ASCII characters is usually 1 byte. Certain characters, such as German umlauts, or Greek and Cyrillic characters require 2 bytes. Asian characters occupy up to 4 bytes per character.

By default, the HALCON string operators work on the basis of Unicode code points. That is, accessing a character of a string always returns the corresponding Unicode code point of the character, regardless of how many bytes are needed to represent the code point in UTF-8. Thus, multi-byte characters, such as Asian characters or German umlauts can be uniformly translated on all systems. Please note that the Unicode standard also allows you to assemble printable characters from multiple code points (using so-called 'Combining Diacritical Marks'). This is currently not fully supported by HALCON: in HALCON, the code points are processed separately, and when strings are compared, equivalent characters are not set equal if coded with different code points.

If there are compatibility problems during execution of older programs, the string encoding of the HALCON library can be changed from 'utf8'"utf8""utf8""utf8""utf8""utf8" to 'locale'"locale""locale""locale""locale""locale" (legacy mode). Then strings are stored depending on the locale and the string operators work – as in previous versions of HALCON – not characterwise but bytewise. If bytewise character processing is also required in UTF-8 mode, the operator set_systemset_systemSetSystemSetSystemSetSystemset_system can be used to set the option 'tuple_string_operator_mode'"tuple_string_operator_mode""tuple_string_operator_mode""tuple_string_operator_mode""tuple_string_operator_mode""tuple_string_operator_mode" from 'codepoint'"codepoint""codepoint""codepoint""codepoint""codepoint" to 'byte'"byte""byte""byte""byte""byte". Afterwards, the string operators no longer work on the basis of code points. The byte sequence of a string can be interesting for debugging, for example.


List of Operators

tuple_environmentTupleEnvironmenttuple_environmentTupleEnvironmenttuple_environment
Read one or more environment variables.
tuple_regexp_matchTupleRegexpMatchtuple_regexp_matchTupleRegexpMatchtuple_regexp_match
Extract substrings using regular expressions.
tuple_regexp_replaceTupleRegexpReplacetuple_regexp_replaceTupleRegexpReplacetuple_regexp_replace
Replace a substring using regular expressions.
tuple_regexp_selectTupleRegexpSelecttuple_regexp_selectTupleRegexpSelecttuple_regexp_select
Select tuple elements matching a regular expression.
tuple_regexp_testTupleRegexpTesttuple_regexp_testTupleRegexpTesttuple_regexp_test
Test if a string matches a regular expression.
tuple_splitTupleSplittuple_splitTupleSplittuple_split
Split strings into substrings using predefined separator symbol(s).
tuple_str_first_nTupleStrFirstNtuple_str_first_nTupleStrFirstNtuple_str_first_n
Cut the first characters up to position “n” out of a string tuple.
tuple_str_last_nTupleStrLastNtuple_str_last_nTupleStrLastNtuple_str_last_n
Cut all characters starting at position “n” out of a string tuple.
tuple_strchrTupleStrchrtuple_strchrTupleStrchrtuple_strchr
Forward search for characters within a string tuple.
tuple_strlenTupleStrlentuple_strlenTupleStrlentuple_strlen
Determine the length of every string within a tuple of strings.
tuple_strrchrTupleStrrchrtuple_strrchrTupleStrrchrtuple_strrchr
Backward search for characters within a string tuple.
tuple_strrstrTupleStrrstrtuple_strrstrTupleStrrstrtuple_strrstr
Backward search for strings within a string tuple.
tuple_strstrTupleStrstrtuple_strstrTupleStrstrtuple_strstr
Forward search for strings within a string tuple.
tuple_substrTupleSubstrtuple_substrTupleSubstrtuple_substr
Cut characters from position “n1” through “n2” out of a string tuple.