List of Operators ↓
This chapter contains operators for string operations.
The HALCON library encodes strings in UTF-8 by default.
UTF-8 is a Unicode character encoding that can encode all Unicode code points with one to four bytes. 'Unicode' refers to the character set that assigns each character of a string to a code point (for example, 'U + 0041' for 'A'). UTF-8 then translates the Unicode code points into binary data. The memory requirement for all ASCII characters is usually 1 byte. Certain characters, such as German umlauts, or Greek and Cyrillic characters require 2 bytes. Asian characters occupy up to 4 bytes per character.
By default, the HALCON string operators work on the basis of Unicode code points. That is, accessing a character of a string always returns the corresponding Unicode code point of the character, regardless of how many bytes are needed to represent the code point in UTF-8. Thus, multi-byte characters, such as Asian characters or German umlauts can be uniformly translated on all systems. Please note that the Unicode standard also allows you to assemble printable characters from multiple code points (using so-called 'Combining Diacritical Marks'). This is currently not fully supported by HALCON: in HALCON, the code points are processed separately, and when strings are compared, equivalent characters are not set equal if coded with different code points.
If there are compatibility problems during execution of older programs, the
string encoding of the HALCON library can be changed from
'locale' (legacy mode).
Then strings are stored depending on the locale and the string operators work –
as in previous versions of HALCON – not characterwise but bytewise.
If bytewise character processing is also required in UTF-8 mode, the operator
can be used to set the option
Afterwards, the string operators no longer work on the basis of code points.
The byte sequence of a string can be interesting for debugging, for example.