ClassesClassesClassesClasses | | | | Operators

String Operations

List of Operators ↓

This chapter contains operators for string operations.

General Information

HALCON internally stores and processes strings as sequences of bytes. In general, it does not care about the semantics of the contained characters. Thus, some results may be surprising, especially if multi-byte characters are used, which are common for most non-ASCII characters in Asian encoding tables and in UTF-8.

As an example for unexpected results consider the operator tuple_strlentuple_strlenTupleStrlentuple_strlenTupleStrlenTupleStrlen, which returns the lengths of strings. At first sight, it might be expected that the length of a string always corresponds to the number of characters within the string, but actually this is the case only for single-byte characters. If a string contains multi-byte characters, the length of the string corresponds to the number of used bytes and thus is larger than the number of characters. Another example is the operator tuple_ordstuple_ordsTupleOrdstuple_ordsTupleOrdsTupleOrds, which converts a tuple of strings into a tuple of integer numbers. Here, when called with multi-byte characters, the operator returns an output tuple that contains more elements than characters of the input tuple and a mapping between input characters and output numbers might be difficult.

The used character encoding (also referred to as the native encoding) depends on the regional settings (locale) of the used operating system. Some commonly used 8-bit encodings are the code pages windows-1252 (CP1252, Western European) and windows-31j (CP932, Japanese) for Windows, and ISO 8859-1 (latin-1, Western European), ISO 8859-15 (latin-9, Western European), shift-jis (Japanese), or UTF-8 for other systems. That is, even with the same language settings the encoding must be taken into account if data is exchanged between, e.g., Windows and Linux.

If data is to be exchanged between systems with different encodings, a conversion might be required. It is usually safe to use 7-bit ASCII characters in the range between 32 - 127 but some encodings assign national variants to selected character codes. Therefore, only the following characters can truly be considered safe in data exchange:

   A-Z a-z 0-9 ! " % & ' ( ) * + , - . / : ; < = > ? 

List of Operators

Read one or more environment variables.
Extract substrings using regular expressions.
Replace a substring using regular expressions.
Select tuple elements matching a regular expression.
Test if a string matches a regular expression.
Split strings into substrings using predefined separator symbol(s).
Cut the first characters up to position “n” out of a string tuple.
Cut all characters starting at position “n” out of a string tuple.
Forward search for characters within a string tuple.
Determine the length of every string within a tuple of strings.
Backward search for characters within a string tuple.
Backward search for strings within a string tuple.
Forward search for strings within a string tuple.
Cut characters from position “n1” through “n2” out of a string tuple.

ClassesClassesClassesClasses | | | | Operators