String Operations

List of Operators ↓

This chapter contains operators for string operations.

General Information

The HALCON library encodes strings in UTF-8 by default.

UTF-8 is a Unicode character encoding that can encode all Unicode code points with one to four bytes. 'Unicode' refers to the character set that assigns each character of a string to a code point (for example, 'U + 0041' for 'A'). UTF-8 then translates the Unicode code points into binary data. The memory requirement for all ASCII characters is usually 1 byte. Certain characters, such as German umlauts, or Greek and Cyrillic characters require 2 bytes. Asian characters occupy up to 4 bytes per character.

By default, the HALCON string operators work on the basis of Unicode code points. That is, accessing a character of a string always returns the corresponding Unicode code point of the character, regardless of how many bytes are needed to represent the code point in UTF-8. Thus, multi-byte characters, such as Asian characters or German umlauts can be uniformly translated on all systems. Please note that the Unicode standard also allows you to assemble printable characters from multiple code points (using so-called 'Combining Diacritical Marks'). This is currently not fully supported by HALCON: in HALCON, the code points are processed separately, and when strings are compared, equivalent characters are not set equal if coded with different code points.

If there are compatibility problems during execution of older programs, the string encoding of the HALCON library can be changed from 'utf8'"utf8""utf8""utf8""utf8" to 'locale'"locale""locale""locale""locale" (legacy mode). Then strings are stored depending on the locale and the string operators work – as in previous versions of HALCON – not characterwise but bytewise. If bytewise character processing is also required in UTF-8 mode, the operator set_systemset_systemSetSystemSetSystemSetSystem can be used to set the option 'tuple_string_operator_mode'"tuple_string_operator_mode""tuple_string_operator_mode""tuple_string_operator_mode""tuple_string_operator_mode" from 'codepoint'"codepoint""codepoint""codepoint""codepoint" to 'byte'"byte""byte""byte""byte". Afterwards, the string operators no longer work on the basis of code points. The byte sequence of a string can be interesting for debugging, for example.

List of Operators

Read one or more environment variables.
Extract substrings using regular expressions.
Replace a substring using regular expressions.
Select tuple elements matching a regular expression.
Test if a string matches a regular expression.
Split strings into substrings using predefined separator symbol(s).
Cut the first characters up to position “n” out of a string tuple.
Cut all characters starting at position “n” out of a string tuple.
Forward search for characters within a string tuple.
Determine the length of every string within a tuple of strings.
Backward search for characters within a string tuple.
Backward search for strings within a string tuple.
Forward search for strings within a string tuple.
Cut characters from position “n1” through “n2” out of a string tuple.