How important is domain specific data?

~ The data challenge for AI applications in quality inspection ~

AI tools like ChatGPT were trained using open-source material. What happens if your AI application requires domain specific data that its locked away behind a wall of secrecy? In this article, Miron Shtiglitz, VP Product and Delivery at visual inspection company QualiSense, explains the importance of domain specific data in developing AI applications for manufacturing.

One of the fundamental driving forces behind the efficacy of deep learning models is the availability of labelled data in copious amounts. While the acquisition of such data has become increasingly accessible for various applications, there exists a significant hurdle when it comes to developing AI-driven systems for production environments, for example quality inspection systems.

The concept of pre-trained deep learning models has gained prominence as a means to facilitate rapid development across multiple applications. We even hear about deep learning models that just need minor tweaks to perform optimally on some applications.

These models possess a foundational understanding of essential features, which enable them to distinguish and comprehend intricate information.

The availability of vast image datasets online has been instrumental in fuelling the growth of such models. Datasets like ImageNet, Coco, NuScense and Google Open Dataset cover a plethora of scenarios, from animal to nature images to object detection and more. These datasets serve as invaluable starting points, especially for applications that fall within their domains. However, challenges arise when a domain lacks an abundance of publicly available data and this is especially true in the industrial space. For example, image datasets for industrial processes are more restricted as no manufacturer makes them available, so developing a model for this domain is more challenging.

Industrial quality inspection

Consider an application situated outside the scope of conventional datasets — a situation often encountered in quality inspection. While using pre-trained models can offer a slight advantage in some situations, this benefit diminishes as the application drifts further away from the dataset’s domain.

For instance, if your application deals with grayscale, hyperspectral or LWIR images, a pre-trained model designed for colour images could inadvertently hinder your efforts by extracting irrelevant features that don’t align with your specific domain. These challenges highlight the need for domain-specific data in developing deep learning applications.

Within the relatively conservative landscape of production, companies understandably harbour reservations about sharing their proprietary data as it may hold secrets relating to intricate processes and inspection solutions that set them apart from their competitors.

At QualiSense, we’ve navigated this challenge by forging a strategic alliance with Johnson Electric, a global leader in the automotive industry. This partnership has granted us access to an array of production lines, furnishing us with a treasure trove of images from the industrial inspection domain. These images encompass diverse processes, materials and applications. This repository of unique data endows QualiSense with a competitive edge. We have used this data to build an AI model that has a generic understanding of the production domain, but can be adapted to the intricacies of each specific production environment.

The rise of deep learning underscores the significance of data in its varied forms. When it comes to quality inspection and other manufacturing applications, the scarcity of accessible proprietary data poses a formidable challenge. While pre-trained models have their place, the true to key success lies in domain-specific data that reflects the intricacies of the production environment.

For more opinion and insight on the technology challenges facing AI-driven quality inspection, visit the QualiSense blog here: qualisense.ai/blog