The Saimple tool developed by Numalis allows to improve the training of neural networks. Biases are common in the training of artificial intelligence algorithms. Although such potential problems are known, it is not easy to detect them.
The objective is to create and train a classification model with an association bias, i.e. a model that gives more or less correct results but whose decision making is not based solely on the characteristic representing the class.
For this purpose, a dataset containing numerous images of wasps and bees was used. Two classes will therefore be represented: "bee" and "wasp". The images in the dataset are divided into two categories for each class:
Photos taken in nature (on flowers, leaves, etc)
Photos taken in a laboratory or in a non-natural environment (jar, plate, etc)
The first experiment will be carried out by training the model on "biased" data. The images of bees and wasps being classified in two different categories, the aim will be to train the model on only one of the categories of each class. Thus, the bee images used will be in a natural environment and the wasp images will be in a laboratory environment. The objective is to make the model base its decision on the environment rather than on the insect.
The biased dataset therefore contains images of bees in a natural environment and images of wasps in a laboratory-type environment, divided into 2469 images of bees and 2816 images of wasps.
The performance of the model is tested on 4 images: :
A bee on a natural background
A bee on a laboratory background
A wasp on a natural background
A wasp on a laboratory background
We observe that the model classifies well the images on which it has been trained (bee in natural environment and wasp in laboratory environment) but it does not classify correctly the others. Indeed, if the model takes into account the environment during its training rather than the insect, then when it evaluates an image of a wasp in a natural environment, it will identify it as a bee.
Predicted class and actual class as a function of the network input image
Here are the relevance data obtained by SAIMPLE, with the 4 images presented above :
Relevance of the biased model on the 4 reference images
The use of the model on Saimple is indicative of poor learning and the presence of bias. The areas taken into account by the neural network to decide on the class of the insect are visually represented thanks to the Saimple relevance function. We observe that on an evaluation that is performed with an image of a wasp in a natural environment, the dominance will be very high for the class "bee" and the relevance shows that the model is not only based on the insect, but also on what surrounds it (e.g. a flower or a background). There are no relevant elements that are taken into account. The entire body of the insects is selected, as well as parts of the landscape such as flowers. The association bias is clearly visible.
The aim here is to train the model without bias. All data available will therefore be used and the model will be saved at each training epoch, in order to be able to follow the evolution of the training and the performance of the model. The unbiased dataset has been balanced and now contains 3183 images for each class: bees and wasps in laboratory and natural environments.
Firstly, at the end of the training phase, we obtain a percentage of correct classifications of 80%, for the 30th epoch. This second model shows less accurate results compared to the previous one on the validation data. It is normal that the model has lost in accuracy since it is now based on the real characteristics of the insect and not simply on the environment in which it is found. This new model does however make the right classifications for the 4 test images presented above.
Relevance of the unbiased model on the 4 reference images
We can see thanks to Saimple that the model focuses more on the body of the insect, we do not notice anymore signs of the bias. Another interesting point can be observed in the image of the bee in the laboratory. Indeed, we can see that many points of relevance are placed on the petiole of the bee. Knowing that the wasp petiole is much thinner than the bee one, the fact that the model focuses on this point proves that the model has learned fundamental characteristics to distinguish these two insects.
Thanks to Saimple, it is possible to compare the results of the model according to the epochs. Three different states of the model are evaluated on the same image, corresponding to weak, medium and strong training (respectively 1 epoch, 15 epochs, 30 epochs). The image used is an image of a wasp on a natural background, which the biased model therefore confuses with a bee.
Evolution of relevance over time
The relevance results obtained with Saimple ensure that the model has learned correctly by checking which elements are important for the classification. The results show that the model is not sufficiently trained at epochs 1 and 15 because parts of the flower are still taken into account and the dominant class is in conflict with other classes, dominance cannot be ensured. Indeed, a class is dominant if its dominance score is always higher than the dominance score of all other classes.
Saimple therefore makes it easy to detect an association bias, which is very difficult to detect without the use of such a tool. Indeed, one cannot know in advance what the model will be based on. In the case of an association bias, it is the training data that introduces the bias and it is impossible to imagine that a human operator could check the data of an entire dataset.
In this use case, Saimple allowed not only to detect the bias, but also to correct it, and finally to ensure that the corrected model was no longer biased.
Finally, Saimple makes it possible to follow the evolution of the training of the model over time. This makes it possible to know after how long the model has been sufficiently trained, thus avoiding problems of under- or over-fitting.
If you are interested in Saimple, want to know more about the use case or if you want to have access to a demo environment of Saimple:
Contact us : support@numalis.com
picture credit : Hans-Peter Gauster