Deep learning in the field of machine vision – a game changer or a supplement to classic methods?

Everyone is talking about artificial intelligence (AI). In the field of machine vision, deep learning as a form of AI is of particular interest. The technology enables robust detection rates and paves the way for completely new applications. But is deep learning really the game changer people say it is or does its great potential actually lie in combining it with classic image processing methods?

Christian Eckstein Product Manager and Business Developer at MVTec

From chatbots, to translation software, digital assistants, and even autonomous cars, artificial intelligence has become an integral part of many areas of life. It is also becoming increasingly widespread in industrial environments, for instance, in the field of robotics and with regard to the digital networking of machines as per the Industrial Internet of Things aka Industry 4.0. In the field of machine vision, a very special form of AI is becoming ever more prevalent: deep learning. The technology is based on convolutional neural networks (CNNs) and is regarded as giving machine vision a genuine boost.

Bottle necks, as shown here, can be checked for defects with anomaly detection

Outstanding object detection results

One thing is clear: deep learning is achieving previously unseen results in the field of object detection. How? Like all AI technologies, deep learning is able to learn independently. This means that there is no need to program individual algorithms for each and every use case. The background to this is as follows:

Deep learning is based on neural networks. These can be trained, enabling the technologies to analyze large quantities of image data (big data) and thus detect certain patterns and correlations and apply these to new cases.

Machine vision takes advantage of this, for example by using deep learning to identify typical features that can be used to assign or classify the objects or defects to be detected with greater precision.

This results in a number of benefits: For example, the development work required in relation to machine vision processes can be significantly reduced as there is no need to manually define key features and object properties for the detection process. In addition, the technology opens the door for new applications that previously have not been possible to implement using traditional image processing methods.

A prime example of this is the new MVTec feature “Global Context Anomaly Detection”: Deep learning is used to understand the logical content of an image, making it possible to recognize completely new types of defects. For example, bottle labels that have slipped or been incorrectly printed, or missing components, for instance on circuit boards, can be detected as defects.

Deep learning has its limits

Although the benefits of deep learning for machine vision are clearly impressive, it also has its limits. The technology is ideal for the three classic fields of application of classification, object recognition, and semantic segmentation, and its benefits can be optimally used in these regards.

The “but” comes with the traceability of decision-making within the neural network. As a “black box”, the technology allows little insight into internal processes. Such insight can be hugely important in the industrial environment though, as is illustrated by the following example: If an engineer is responsible for the quality of a certain semiconductor component in electronics production, he requires detailed documentation of the entire inspection workflow. If this cannot be seamlessly tracked in the black box, the engineer is left in need of explanation in the event of an undetected defect. Classic image processing methods offer far more transparency in this regard as the image properties on which certain decisions are based are specifically and transparently described.

Illustration of deep learning method object recognition with aligned rectangles

A high-performance hardware platform is required

A further constraint is that deep learning can sometimes involve a large amount of training, which requires an appropriately high level of resources. The production line also needs a high-performance hardware platform, which is not available in all industrial applications. The latter is especially true when AI-based machine vision technologies are used on embedded devices. Furthermore, deep learning is simply oversized in some areas of use, so the technology’s high performance and storage requirements – and thus its cost – are hard to justify. In such cases, tasks can often be resolved more elegantly, simply, and cost effectively using classic machine vision.

So as not to miss out on the benefits of AI technology, the ideal solution is to intelligently combine deep learning and traditional, rule-based machine vision methods.

With this kind of hybrid approach, both technologies can make optimal use of their particular strengths for the application at hand. For example, classic methods can be used to perform preprocessing steps such as correctly orienting objects. The decision as to whether the object is a reject can then be efficiently made by deep learning using a smaller image area. Combining the techniques makes it possible to trace the specific decision criteria used to classify objects or defects more transparently.