~ Why model training will not work without relevant data ~
OpenAI, the company behind ChatGPT, has been hit by a series of lawsuits for having trained its models on materials available on the internet. The case will be a major legal landmark in how copyright law is applied to AI training. Access to copyrighted material will be essential to developing effective AI systems for manufacturing, argues Zohar Kantor, chief revenue and customer success officer at quality inspection pioneer QualiSense. In this article, Zohar explains how QualiSense has managed to access copyrighted production data and why the AI systems they have developed would not be effective without it.
In April 2024, a group of eight US newspapers, including The New York Daily News, Chicago Tribune and Denver Post, sued OpenAI and Microsoft for allegedly using their copyrighted articles without permission to train their AI models. The New York Times had already sued both OpenAI and Microsoft in December, on similar grounds.
These legal challenges reflect broader issues in the field of generative AI, particularly concerning the ethical sourcing of training data and the accuracy and reliability of AI-generated content. But when training an AI model there is no escaping the need for vast quantities of data. For an application like ChatGPT, which is designed to provide information about any topic, the data required is extraordinarily vast. Putting aside the legal rights and wrongs of this case, you cannot train a model without this data.
In addition to the quantity of data, the relevancy of that data is also crucial. If you want to build an application for a manufacturing environment, for example, you need manufacturing data. Unlike the data that is freely available on the internet, the relevant data here is closely guarded by manufacturing companies. They have a dilemma, they want to unlock the power of AI, but they won’t easily give away their data.
Model training for defect detection
If you want to build an AI model for a specific use case, for example quality inspection, you need data that is highly relevant to that specific use case. However, to achieve the end goal of a deployable model, there are different routes. If you are starting from zero every time, the process of building the model for your production line will take much longer, require more images and necessitate greater input from the quality manager.
The alternative route, which achieves the same outcome, but in less time and with less hassle, is to develop an AI backbone, essentially pre-training a model with relevant data. In the same way that a human being would recognise a new car they had never seen before as belonging to the category of “car” based on their prior knowledge of cars, so too an AI model, trained on vast quantities of relevant manufacturing data, can recognise a “crack” or a “watermark” on a metal surface, based on its pre-training data.
This will not get you to the end goal, but it will give you a significant head start. Whereas ChatGPT might be able to make mistakes, the KPIs for defect detection typically allow an error rate close to zero. It’s a different game, with a much higher standard. The only way you can achieve a deployable model that will meet this KPI is to tailor its training to the specific use case, by giving it data from your production line and feedback from the quality manager.
However, if the pre-training data is voluminous and relevant, you have a powerful backbone in place. With this backbone, you have already completed half the job of training a fully deployable AI model. The problem with building this model is, as the case of ChatGPT shows, you need a lot of data.
At QualiSense, our solution has been to partner with Johnson Electric. We’ve secured an agreement with them that provides access to vast troves of manufacturing data from eighteen manufacturers across the world. This means our model has a powerful backbone allowing it to recognise different types of defects on metal surfaces.
QualiSense is a fast-growing start-up developing AI software for manufacturing use-cases. Find out more at https://qualisense.ai/