3D Gripping Point Detection [HALCON Operator Reference / Version 22.11.4.0]

3D Gripping Point Detection

This chapter explains how to use 3D Gripping Point Detection.

3D Gripping Point Detection is used to find suitable gripping points on the surface of arbitrary objects in a 3D scene. The results can be used to target the gripping points with a robot arm and pick up the objects using vacuum grippers with suction cups.

A possible example for a 3D Gripping Point Detection application: A 3D scene (e.g., an RGB image and XYZ-images) is analyzed and possible gripping points are suggested.

HALCON provides a pretrained model which is ready for inference without an additional training step. 3D Gripping Point Detection also works on objects that were not seen in training. Thus, there is no need to provide a 3D model of the objects that are to be targeted. 3D Gripping Point Detection can also cope with scenes containing various different objects at once, scenes with partly occluded objects, and with scenes containing cluttered 3D data.

The inference workflow is described in the following section.

General Workflow

This paragraph describes how to determine a suitable gripping point on arbitrary object surfaces using a 3D Gripping Point Detection model. An application scenario can be seen in the HDevelop example 3d_gripping_point_detection_workflow.hdev.

Read the pretrained 3D Gripping Point Detection model by using
- read_dl_modelread_dl_modelReadDlModelReadDlModelReadDlModelread_dl_model.
Set the model parameter regarding, e.g., the used devices or image dimensions using
- set_dl_model_paramset_dl_model_paramSetDlModelParamSetDlModelParamSetDlModelParamset_dl_model_param.
Generate a data dictionary DLSample for each 3D scene. This can be done using the procedure
- gen_dl_samples_3d_gripping_point_detection,
which can cope with different kinds of 3D data. For further information on the data requirements see the section “Data” below.
Preprocessing of the data before the inference. For this, you can use the procedure
- preprocess_dl_samples.
The required preprocessing parameters can be generated from the model with
- create_dl_preprocess_param_from_model
or set manually using
- create_dl_preprocess_param.
Note that the preprocessing of the data has significant impact on the inference. See the section “3D scenes” below for further details.
Apply the model using the operator
- apply_dl_modelapply_dl_modelApplyDlModelApplyDlModelApplyDlModelapply_dl_model.
Perform a post-processing step on the resulting DLResult to retrieve gripping points for your scene using the procedure
- gen_dl_3d_gripping_points_and_poses.
Visualize the 2D and 3D results using the procedure
- dev_display_dl_data.

Data

This section gives information on the data that needs to be provided for the inference with a 3D Gripping Point Detection model.

As a basic concept, the model handles data by dictionaries, meaning it receives the input data from a dictionary DLSample and returns a dictionary DLResult. More information on the data handling can be found in the chapter Deep Learning / Model.

3D scenes

3D Gripping Point Detection processes 3D scenes, which consist of regular 2D images and depth information.

In order to adapt these 3D data to the network input requirements, a preprocessing step is necessary for the inference. See the section “Specific Preprocessing Parameters” below for information on certain preprocessing parameters. It is recommended to use a high resolution 3D sensor, in order to ensure the necessary data quality. The following data are needed:

2D image

RGB image, or
intensity (gray value) image

Intensity image.

Depth information

X-image (values need to increase from left to right)
Y-image (values need to increase from top to bottom)
Z-image (values need to increase from points close to the sensor to far points; this is for example the case if the data is given in the camera coordinate system)


(1)	(2)	(3)

(1) X-image, (2) Y-image, (3) Z-image.

Normals (optional)

2D mappings (3-channel image)

Normals image.

Providing normal images improves the runtime, as this avoids the need for their computation.

In order to restrict the search area, the domain of the RGB/intensity image can be reduced. For details, see the section “Specific Preprocessing Parameters” below. Note that the domain of the XYZ-images and the (optional) normals images need to be identical. Furthermore, for all input data, only valid pixels may be part of the used domain.

Model output

As inference output, the model will return a dictionary DLResult for every sample. This dictionary includes the following entries:

'gripping_map': Binary image, indicating for each pixel of the scene whether the model predicted a gripping point (pixel value = 1.0) or not (0.0).
'gripping_confidence': Image, containing raw, uncalibrated confidence values for every point in the scene.

Postprocessing

The model results DLResult can be postprocessed with gen_dl_3d_gripping_points_and_poses in order to generate gripping points. Furthermore, this procedure can be parameterized in order to reject small gripping regions using min_area_size, or serve as a template to define custom selection criteria.

The procedure adds the following entry to the dictionary DLResult:

'gripping_points': Tuple of dictionaries containing information on suitable gripping points in a scene:
- 'region': Connected region of potential gripping points. The determined gripping point lies inside this region.
- 'row': Row coordinate of the gripping point in the preprocessed RGB/intensity image.
- 'column': Column coordinate of the gripping point in the preprocessed RGB/intensity image.
- 'pose': 3D pose of the gripping point (relative to the coordinate system of the XYZ-images, i.e., of the camera) which can be used by the robot.

Specific Preprocessing Parameters

In the preprocessing step, along with the data, preprocessing parameters need to be passed to preprocess_dl_samples. Two pairs of those preprocessing parameters have particularly significant impact:

'image_width', 'image_height': Determine the image dimensions of the images to be inferred.

With larger image dimensions and thus a better resolution, smaller gripping surfaces can be detected. However, the runtime and memory consumption of the application increases.
'min_z', 'max_z': Determine the allowed distance from the camera for 3D points based on the Z-image.

These parameters can therefore help to reduce erroneous outliers and therefore increase the application robustness.

A restriction of the search area can be done by reducing the domain of the input images (using reduce_domainreduce_domainReduceDomainReduceDomainReduceDomainreduce_domain). The way preprocess_dl_samples handles the domain is set using the preprocessing parameter 'domain_handling'. The parameter 'domain_handling' should be used in a way that only essential information is passed on to the network for inference. The following images show how an input image with reduced domain is passed on after the preprocessing step depending on the set 'domain_handling'.


(1)	(2)	(3)	(4)

(1) Input image with reduced domain (red), (2) image for 'full_domain', (3) image for 'keep_domain', (4) image for 'crop_domain'.

List of Sections

Operators