List of Operators ↓
This chapter contains operators for string operations.
HALCON internally stores and processes strings as sequences of bytes. In general, it does not care about the semantics of the contained characters. Thus, some results may be surprising, especially if multi-byte characters are used, which are common for most non-ASCII characters in Asian encoding tables and in UTF-8.
As an example for unexpected results consider the operator tuple_strlen, which returns the lengths of strings. At first sight, it might be expected that the length of a string always corresponds to the number of characters within the string, but actually this is the case only for single-byte characters. If a string contains multi-byte characters, the length of the string corresponds to the number of used bytes and thus is larger than the number of characters. Another example is the operator tuple_ords, which converts a tuple of strings into a tuple of integer numbers. Here, when called with multi-byte characters, the operator returns an output tuple that contains more elements than characters of the input tuple and a mapping between input characters and output numbers might be difficult.
The used character encoding (also referred to as the native encoding) depends on the regional settings (locale) of the used operating system. Some commonly used 8-bit encodings are the code pages windows-1252 (CP1252, Western European) and windows-31j (CP932, Japanese) for Windows, and ISO 8859-1 (latin-1, Western European), ISO 8859-15 (latin-9, Western European), shift-jis (Japanese), or UTF-8 for other systems. That is, even with the same language settings the encoding must be taken into account if data is exchanged between, e.g., Windows and Linux.
If data is to be exchanged between systems with different encodings, a conversion might be required. It is usually safe to use 7-bit ASCII characters in the range between 32 - 127 but some encodings assign national variants to selected character codes. Therefore, only the following characters can truly be considered safe in data exchange:
A-Z a-z 0-9 ! " % & ' ( ) * + , - . / : ; < = > ?