Deep OCR

List of Operators ↓

This chapter explains how to use deep-learning-based optical character recognition (Deep OCR).

With Deep OCR we want to detect and/or recognize text in an image. Deep OCR detects and recognizes connected characters, which will be referred to as 'words' (in contrast to OCR methods which are used to read single characters).

A possible example for deep-learning-based optical character recognition: Words in an image are detected and recognized.

A Deep OCR model can contain two components, which are dedicated to two distinct tasks, the detection, thus the localization of words, and the recognition of words. By default a model with both components is created, but the model can also be limited to one of the two tasks solely.

HALCON already provides pretrained components, which are suited for a multitude of applications without additional training as the model is trained on a varied dataset and can therefore cope with many different fonts. Information on the available character set and model parameters can be retrieved using get_deep_ocr_paramget_deep_ocr_paramGetDeepOcrParamGetDeepOcrParamGetDeepOcrParamget_deep_ocr_param. To further adjust the reading to a specific task it is possible to retrain the recognition component on a given application domain using deep learning operators.
image/svg+xml 'lemon'
The recognition can be finetuned for an application by retraining the Deep OCR model with custom data.
The general workflow as well as the retraining are described in the following paragraphs.

General Workflow for Deep OCR Inference

This paragraph describes the workflow how to localize and read words using a Deep OCR model. An application scenario can be seen in the HDevelop example deep_ocr_workflow.hdev.

Creation of the Deep OCR model

Create a Deep OCR model containing either one or both of the two model components

using the operator create_deep_ocrcreate_deep_ocrCreateDeepOcrCreateDeepOcrCreateDeepOcrcreate_deep_ocr.

To use the retrained model component instead of the provided one adjust the created model by setting the retrained model component as 'recognition_model'"recognition_model""recognition_model""recognition_model""recognition_model""recognition_model" using


Model parameters regarding, e.g., the used devices, image dimensions, or minimum scores can be set using set_deep_ocr_paramset_deep_ocr_paramSetDeepOcrParamSetDeepOcrParamSetDeepOcrParamset_deep_ocr_param.

The Deep OCR model is applied on your acquired images using apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrApplyDeepOcrapply_deep_ocr. The inference results depend on the used model components. See the operator reference of apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrApplyDeepOcrapply_deep_ocr for details regarding which dictionary entries are computed for each model composite.

The inference results can be retrieved from the dictionary DeepOCRResultDeepOCRResultDeepOCRResultDeepOCRResultdeepOCRResultdeep_ocrresult. Some procedures are provided in order to visualize results and score maps:

Training and Evaluation of the Recognition Component

This paragraph describes the retraining and evaluation of the recognition component of a Deep OCR model using custom data. See also the HDevelop example deep_ocr_recognition_training_workflow.hdev for an application scenario.

Preprocess the data

This part is about how to preprocess your data.

  1. The information what is to be read in which image of your training dataset needs to be transferred. This is done by the procedure

    • read_dl_dataset_ocr_recognition.

    It creates a dictionary DLDataset which serves as a database and stores all necessary information about your data. For more information about the data and the way it is transferred, see the section “Data” below and the chapter Deep Learning / Model.

  2. Split the dataset represented by the dictionary DLDataset. This can be done using the procedure

    • split_dl_dataset.

  3. The network imposes several requirements on the images. These requirements (for example the image size and gray value range) can be retrieved with

    For this you need to read the model first by using

  4. Now you can preprocess your dataset. For this, you can use the procedure

    • preprocess_dl_dataset.

    To use this procedure, specify the preprocessing parameters as, e.g., the image size. Store all the parameter with their values in a dictionary DLPreprocessParam, for which you can use the procedure

    • create_dl_preprocess_param_from_model.

    We recommend to save this dictionary DLPreprocessParam in order to have access to the preprocessing parameter values later during the inference phase.

Training of the model

This part explains how to train the recognition component of a Deep OCR model.

  1. Set the training parameters and store them in the dictionary TrainParam. This can be done using the procedure

    • create_dl_train_param.

  2. Train the model. This can be done using the procedure

    • train_dl_model.

    The procedure expects:

    • the model handle DLModelHandleDLModelHandleDLModelHandleDLModelHandleDLModelHandledlmodel_handle

    • the dictionary DLDataset containing the data information

    • the dictionary TrainParam containing the training parameters

Evaluation of the retrained model

In this part, we evaluate the Deep OCR model.

  1. Set the model parameters which may influence the evaluation.

  2. The evaluation can be done conveniently using the procedure

    • evaluate_dl_model.

    This procedure expects a dictionary GenParamEval with the evaluation parameters.

  3. The dictionary EvaluationResult holds the accuracy measures. To get a clue on how the retrained model performed against the pretrained model you can compare their accuracy values.

After a successful evaluation the retrained model can be used for inference (see section “General Workflow for Deep OCR Inference” above).


This section gives information on the data that needs to be provided in different stages of the Deep OCR workflow.

We distinguish between data used for training and evaluation, consisting of images with their information about the instances, and data for inference, which are bare images. How the data needs to be provided is explained in the according sections below.

As a basic concept, the model handles data over dictionaries, meaning it receives the input data over a dictionary DLSampleDLSampleDLSampleDLSampleDLSampledlsample and returns a dictionary DLResultDLResultDLResultDLResultDLResultdlresult and DLTrainResultDLTrainResultDLTrainResultDLTrainResultDLTrainResultdltrain_result, respectively. More information on the data handling can be found in the chapter Deep Learning / Model.

Data for training and evaluation

The dataset consists of images and corresponding information. They have to be provided in a way the model can process them. Concerning the image requirements, find more information in the section “Images” below.

The training data is used to train and evaluate a network for your specific recognition scenario. With the aid of this data the network can learn to read text samples that resemble text that occurs during inference. The necessary information is given by providing the depicted word for each image.

How the data has to be formatted in HALCON for a DL model is explained in the chapter Deep Learning / Model. In short, a dictionary DLDatasetDLDatasetDLDatasetDLDatasetDLDatasetdldataset serves as a database for the information needed by the training and evaluation procedures.

The data for DLDatasetDLDatasetDLDatasetDLDatasetDLDatasetdldataset can be provided in two different ways. In both cases the dataset can be read using read_dl_dataset_ocr_recognition and will be converted as required.

Dataset based on images with word labels

In this case, images with words that are labeled with rotated bounding boxes need to be provided. You can label your data using the MVTec Deep Learning Tool, available from the MVTec website. The dataset must be built as follows:

  • 'class_ids': class IDs

  • 'class_names': class names (Needs to contain the class 'word'. All other classes are ignored.)

  • 'image_dir': path to the image directory

  • 'samples': tuple of dictionaries, one for each sample

    • 'image_file_name': name of the image file

    • 'image_id': image ID

    • 'bbox_col': bounding box column coordinate

    • 'bbox_row': bounding box row coordinate

    • 'bbox_phi': bounding box angle

    • 'bbox_length1': first half edge length of the bounding box

    • 'bbox_length2': second half edge length of the bounding box

    • 'label_custom_data': list of dictionaries containing custom label data for each bounding box

      • 'text' word to be read

Dataset based on word crop images

In this case, only images that are cropped to a single word each are included in the dataset. The dataset must be built as follows:

  • 'image_dir': path to the image directory

  • 'samples': tuple of dictionaries, one for each sample

    • 'image_file_name': name of the image file

    • 'image_id': image ID

    • 'word': word to be read in the image

The example program deep_ocr_prelabel_dataset.hdev can provide assistance by prelabeling your data.

Your training data should cover the full range of characters that might occur during inference. If a character is not or only very rarely contained in the training dataset the model might not properly learn to recognize that character. To keep track of the character distribution within the dataset the procedure gen_dl_dataset_ocr_recognition_statistics is provided, which generates statistics on how often every single character is contained in your dataset.

You also want enough training data to split it into three subsets, used for training, validation and testing the network. These subsets are preferably independent and identically distributed, see the section “Data” in the chapter Deep Learning.


The model poses requirements on the images, such as the dimensions, the gray value range, and the type. See the documentation of read_dl_modelread_dl_modelReadDlModelReadDlModelReadDlModelread_dl_model for the specific values of the trainable Deep OCR model. For a read model they can be queried with get_dl_model_paramget_dl_model_paramGetDlModelParamGetDlModelParamGetDlModelParamget_dl_model_param. In order to fulfill these requirements, you may have to preprocess your images. Standard preprocessing of an entire sample, including the image, is implemented in preprocess_dl_samples.

Requirements for images used for inference are described in apply_deep_ocrapply_deep_ocrApplyDeepOcrApplyDeepOcrApplyDeepOcrapply_deep_ocr.

Model output

The network output depends on the task:


As output, the operator will return a dictionary DLTrainResultDLTrainResultDLTrainResultDLTrainResultDLTrainResultdltrain_result with the current value of the total loss as well as values for all other losses included in your model.

inference and evaluation

As output, the network will return a dictionary DLResultDLResultDLResultDLResultDLResultdlresult for every sample. This dictionary will include the recognized word as well as the candidates and their confidences for every character of the word.

List of Operators

Apply a Deep OCR model on a set of images for inference.
Create a Deep OCR model.
Return the parameters of a Deep OCR model.
Read a Deep OCR model from a file.
Set the parameters of a Deep OCR model.
Write a Deep OCR model in a file.