SAIMPLE : Making reliable an AI system for quality control in industry

20 avril 2023

The transition to Industry 4.0 is a crucial issue for the manufacturing sector, and AI represents a powerful lever to accelerate this revolution.

Maintaining a technological superiority or upgrading technologically is either a guarantee of a competitive advantage or an imperative for survival.

Artificial intelligence is the "new" technology that has convinced some manufacturers and is increasingly being implemented in factories. In an industry, the contributions of artificial intelligence can take many forms and can be extended to all of a company's services [1]: 

  • Decision-making support: artificial intelligence is used to carry out data processing, to assist in monitoring the company's activities and management (inventory monitoring, purchase forecasts, production suggestions, etc.).

  • Maintenance: AI is used to perform predictive maintenance on machines, i.e. to anticipate the occurrence of a breakdown by monitoring indicators that may reveal a possible malfunction.

  • Quality control: neural networks are used to detect flaws by means of computer vision. In concrete terms, cameras are used and algorithms analyze the images to detect potential defects in the production line in near real time.

  • And much more: machine learning and deep learning algorithms can still assist industries in many cases. For example, by being used to sort and order a set of parts, in chatbots to automate customer relations. Or by simply automating long, repetitive and low value-added tasks to make employee’s work more pleasant and relevant, for example regarding report writing.

Despite the magnitude of the potential gains, there are risks associated with the use of AI that need to be managed in order to maximise its benefits and to comply with the standardisation of AI technologies in the near future. Ensuring the reliability of an AI system is therefore an imperative that is not so easy to accomplish.

This use case presents methods to evaluate the reliability of an AI system used for quality control in a production line.

OBJECTIVE: to make the detection of flaws on industrial parts more reliable by means of artificial intelligence

Since the industrial revolution, production machines have been progressively automated and quality controls are frequently carried out for regulatory reasons, to guarantee production performance or to preserve the brand image [2].

Today, quality control is an activity that can be carried out manually and requires a lot of manpower to be effective. AI can help solve the problem of labour shortage while improving the rate of flaw detection by performing 24-hour inspections and increasing the productivity of workers who can then concentrate on other tasks. [3].

As previously discussed, in order to implement an AI for fault detection with confidence, it is important to be able to guarantee the reliability of the system. To evaluate this reliability, we will rely on different metrics offered by Saimple: relevance (which refers to the explainability of the AI) and dominance (which refers to the robustness of the system). We will also analyse the accuracy of the AI model and we will add specific noise to reinforce the pertinence of the robustness evaluation.


The dataset used in this case study comes from Kaggle and relates to moulding products.

Moulding or casting is the process of making raw parts by pouring molten metal into a mould of sand or metal; the cast metal then solidifies and produces the desired part. However, defects in the mould or defects in the solidification of the casting can lead to defective parts.

The dataset consists of 7348 images of size 300x300 pixels containing two classes: 

  • ok: the part is not defective

  • defect: the part is defective

It is divided into two sub-folders: 

  • train: consisting of 3758 images of defective impellers and 2875 non-defective impellers

  • test: consisting of 453 images of defective impellers and 262 non-defective ones

All the images are top views of submersible pump impellers.


Comparison between a non-defective and a defective part

It should also be noted that these images are all in grey scale and that a data augmentation has already been performed to produce the dataset.

Example of data augmentation performed

The original images, i.e. without data augmentation, are available in the casting_512x512 folder on kaggle.


Moulding techniques in industry are often used because they allow the production of complex shaped parts in identical manners and in massive quantities.

However, the mass production of castings sometimes presents flow defects that introduce undesirable irregularity into a metal moulding process.

The aim is to build a convolution-based model to make the defect detection process automatic for casting defects.


The set of test images are classified by the model according to its decision making regarding the defective nature of a part. To allow us to start evaluating the performance of the model, let us visualise the results in the form of a confusion matrix. Thus, it is possible to visualise the percentage of images that are correctly classified and the percentage of those that are incorrectly classified.

Confusion matrix

According to the confusion matrix, out of all the test images, 1.9% of the parts were classified as defective when in fact they were not, this case is called false positive. Conversely, 12% were classified as non-defective when in fact they were, this case is called a false negative.

Misclassification can be a very significant problem in the industry. The case of a false negative is one where the model classifies a defective part as a conforming part and therefore there is a risk that this will damage the company's image regarding the quality of the end products delivered. Indeed, each defect implies a reduction in the quality of the part, which can lead to safety problems when using it.

The case of false positives is where the model classifies a compliant part as defective, raising a false alarm and may result in increased maintenance and production costs. These cost increases can be related to lost time following the unjustified triggering of an analysis/maintenance operation on the production line or by the time wasted by a worker manually checking whether the part designated by the AI as defective is really so, in order to avoid automatically throwing away a part and generating an additional deadweight loss.

The challenge of having the most reliable AI models possible is therefore very important for manufacturers implementing this advanced technology in their factories both in terms of image and time/money saving.


The amount of false negatives or false positives is one of the performance indicators when developing a classification model. It is therefore important to understand the origin of these misclassifications by analyzing them, so that corrective measures can be taken to reduce these misclassifications. 

Let us first focus on the false negatives. To do this, 4 images with defects but not classified as defective are selected. The red circles have been added here to improve the visibility of the defects but they are obviously not present in the images of the dataset.

Visualisation of defects

It should be noted that the label "ok" corresponds to the defectless part and the label "def" corresponds to the defective part. Among all the misclassified images, we can notice that the defects are not located at the edge of the part. An analysis of the relevance will perhaps make it possible to explain the bad classification. Let us use the relevance provided by Saimple to visualise the behaviour of the model with respect to the classification of the above images and compare the results with those obtained for a well classified image.

Saimple results on defective parts: relevance and dominance

The results, above, allow us to compare a well classified image with a misclassified image. The well classified image obtains a more significant relevance on the external and internal contour of the part (blue and red pixels). The dominance (green and blue bars) of the latter indicates that the model seems to classify the image with certainty by having a score higher than 0.9. While for the misclassified image, the dominance score is lower than the score of the previous image. By studying the relevance, it is possible to understand which elements of the image have influenced the misclassification of the model. The relevance result for the misclassified image is clearly different. Indeed, the model seems to focus more on the black internal elements of the part and slightly on the defects. However, even though it seems to detect the defects, it does not classify the part as defective as it does in cases where the defects are located on the edge of the parts. One explanation could be that the images with defects on the edges of the parts are over-represented in the dataset and therefore the algorithm bases its decision on the imperfectness of the outer contour of the parts. The detection of defects on the inside of the parts is therefore not considered a defective feature since these parts do not have defects on their edges. There could therefore be a bias in the training of the algorithm, which would cause classification errors for defects present on the inner parts of the parts.

Perhaps the introduction of this potential bias in the dataset is related to the fact that defects on the contours of parts are much more common in the metal industry than defects on the inner surfaces of parts. This is why it is important to balance the datasets well, not in proportion to the probabilities of occurrence of the events but simply in terms of numbers. This is a classic pitfall that Saimple can point out.

Since to improve the efficiency of the algorithm the dataset must be balanced across all classes and types of defects, one solution may be to use data augmentation again but this time to balance the types of defects. Indeed, data augmentation is also effective in balancing the dataset when some features are not very present because data is missing for them. However, as demonstrated in other use cases, depending on how data augmentation is performed, it must be performed on all classes to be effective. This can be complicated when it comes to determining the right time to stop a data augmentation process; to improve performance while limiting development costs. The goal is to have raw data and augmented data for each class and for the data to be equivalent in number. Saimple can also help in this regard by allowing the performance of the algorithm to be monitored throughout the process, so that it is known when a result is satisfactory and the process can be stopped.


The Saimple tool models the homogeneous noises (disturbances) that are possible under certain conditions through an abstract object over an entire image. All possible combinations/variations of the original image are done according to a defined Delta amplitude. However, the abstract analysis can also be performed on a specific area, to more accurately assess the robustness of the network to very specific noises applying to areas of interest.

The addition of local noise to the images could thus help to solve the classification problem raised above, by participating in the process of strengthening the robustness of the network. In addition to serving as an evaluation tool, the abstract analysis can be used to guide data augmentation by measuring the impact on the trained network throughout the process. Also, concrete images from the abstract analysis can be used to feed the data augmentation by allowing the dataset to be completed with images that are only locally noisy. For example, one can imagine an image with a luminous halo only on a small part of the image.

A mask can be created to apply the noise locally. Here it is the outline of the part, in red.