String Operations

List of Operators ↓

This chapter contains operators for string operations.

General Information

The HALCON library encodes strings in UTF-8 by default.

UTF-8 is a Unicode character encoding that can encode all Unicode code points with one to four bytes. 'Unicode' refers to the character set that assigns each character of a string to a code point (for example, 'U + 0041' for 'A'). UTF-8 then translates the Unicode code points into binary data. The memory requirement for all ASCII characters is usually 1 byte. Certain characters, such as German umlauts, or Greek and Cyrillic characters require 2 bytes. Asian characters occupy up to 4 bytes per character.

By default, the HALCON string operators work on the basis of Unicode code points. That is, accessing a character of a string always returns the corresponding Unicode code point of the character, regardless of how many bytes are needed to represent the code point in UTF-8. Thus, multi-byte characters, such as Asian characters or German umlauts can be uniformly translated on all systems. Please note that the Unicode standard also allows you to assemble printable characters from multiple code points (using so-called 'Combining Diacritical Marks'). This is currently not fully supported by HALCON: in HALCON, the code points are processed separately, and when strings are compared, equivalent characters are not set equal if coded with different code points.

If there are compatibility problems during execution of older programs, the string encoding of the HALCON library can be changed from 'utf8'"utf8""utf8""utf8""utf8" to 'locale'"locale""locale""locale""locale" (legacy mode). Then strings are stored depending on the locale and the string operators work – as in previous versions of HALCON – not characterwise but bytewise. If bytewise character processing is also required in UTF-8 mode, the operator set_systemset_systemSetSystemSetSystemSetSystem can be used to set the option 'tuple_string_operator_mode'"tuple_string_operator_mode""tuple_string_operator_mode""tuple_string_operator_mode""tuple_string_operator_mode" from 'codepoint'"codepoint""codepoint""codepoint""codepoint" to 'byte'"byte""byte""byte""byte". Afterwards, the string operators no longer work on the basis of code points. The byte sequence of a string can be interesting for debugging, for example.


List of Operators

tuple_environmentTupleEnvironmentTupleEnvironmenttuple_environment
Read one or more environment variables.
tuple_regexp_matchTupleRegexpMatchTupleRegexpMatchtuple_regexp_match
Extract substrings using regular expressions.
tuple_regexp_replaceTupleRegexpReplaceTupleRegexpReplacetuple_regexp_replace
Replace a substring using regular expressions.
tuple_regexp_selectTupleRegexpSelectTupleRegexpSelecttuple_regexp_select
Select tuple elements matching a regular expression.
tuple_regexp_testTupleRegexpTestTupleRegexpTesttuple_regexp_test
Test if a string matches a regular expression.
tuple_splitTupleSplitTupleSplittuple_split
Split strings into substrings using predefined separator symbol(s).
tuple_str_first_nTupleStrFirstNTupleStrFirstNtuple_str_first_n
Cut the first characters up to position “n” out of a string tuple.
tuple_str_last_nTupleStrLastNTupleStrLastNtuple_str_last_n
Cut all characters starting at position “n” out of a string tuple.
tuple_strchrTupleStrchrTupleStrchrtuple_strchr
Forward search for characters within a string tuple.
tuple_strlenTupleStrlenTupleStrlentuple_strlen
Determine the length of every string within a tuple of strings.
tuple_strrchrTupleStrrchrTupleStrrchrtuple_strrchr
Backward search for characters within a string tuple.
tuple_strrstrTupleStrrstrTupleStrrstrtuple_strrstr
Backward search for strings within a string tuple.
tuple_strstrTupleStrstrTupleStrstrtuple_strstr
Forward search for strings within a string tuple.
tuple_substrTupleSubstrTupleSubstrtuple_substr
Cut characters from position “n1” through “n2” out of a string tuple.