The autonomous vehicle market is estimated to be worth more than 500 billion euros by 2035. However, the development of this market is strongly linked to the overcoming of technological, material, ethical and societal constraints. Thus, many questions around security arise.
A 2019 study on the perception of 51 countries regarding autonomous cars showed that about 44% of respondents believe that self-driving vehicles are safe. For 39%, they will be safe enough to use in less than 5 years {1}.
One of the challenges is to demonstrate the safety of the AI systems on board. But how to guarantee the reliability of the models? What criteria allowed the AI to make its decisions?
This article aims at answering these questions by presenting, through concrete cases, the functionalities of the Saimple tool to ensure the confidence of the models.
An autonomous vehicle is a vehicle designed to drive without the intervention of a driver. There are five levels of autonomy, ranging from driver assistance to full autonomy {2}. Today, most recent vehicles have some degree of driving automation, such as parking and emergency braking assistance. These devices use sensors contained in the vehicle collecting data, largely in the form of images.
But before producing a fully autonomous vehicle, it is necessary to have a trusted, robust device that meets current safety standards. As announced, several challenges regarding the safe operation of such vehicles remain.
For example, how can we guarantee that the performance of the driving system will be maintained in the event of a heavy snowfall, which could disrupt the vehicle's sensors? The number of cases to consider is almost infinite and it is impossible to test them all exhaustively. It is therefore necessary to design AI systems that can be robust by construction.
To help with this process, Numalis has developed Saimple, a tool that helps understand why a model (in this case, a neural network) has made each of its decisions. It also allows to consider different types of noise in the images to measure the robustness of the model to the perturbations it may face. Thus, Saimple will help validate the system and increase the confidence that the user can have in it.
The Use Case deals with traffic sign recognition which is part of the classical framework for the development of driverless vehicles.
The dataset contains more than 50,000 images. It consists of 43 distinct classes representing different types of German traffic signs:
These sign images are very varied. Indeed, in the same class, it is possible to have the same image but with variations of parameters: luminosity, contrast, colorization and resolution. Moreover, they are of various sizes, ranging from 30x30 to 140x140 pixels. For the most part, the 30x30 image sizes, known as small, are explained by the fact that they are intended to be analyzed by on-board networks in vehicles.
In spite of the resolution of the current cameras, such a choice of resolution is explained by the fact that the signs must be detected as far as possible by the vehicle so that its driving adapts to it as soon as possible.
Thus, it is necessary to be able to classify images of road signs, even with small resolutions, in order to better anticipate the vehicle's decision. For example, in the case of a road sign representing a "stop" sign, the vehicle must be able to stop at the line and thus start its braking well before in order not to brake suddenly just in front.
The GTRSB dataset used is referenced below: https://benchmark.ini.rub.de/?section=gtsrb&subsection=datas
Here are four images from class 0
Here are four images from class 21
As mentioned before the GTSRB dataset is very rich, it contains about 50,000 images and more than 40 classes. Nevertheless, the dataset is not perfectly balanced. As some traffic signs are rarer than others, it is normal to have classes that do not have the same number of images. However, a network trained on an unbalanced dataset may have weaknesses in recognizing some classes that are too sparsely represented compared to other classes.
Data augmentation can be used to rebalance classes. It is a technique for improving the dataset. It consists in using a sample of images of under-represented classes to generate new images and thus balance the dataset.
After performing the data augmentation and re-training the network, we can then re-examine the network with Saimple to observe if the data augmentation was effective, to show the impact of the data augmentation on the model and to determine if it made the network more robust. To perform a data augmentation, it is possible to use many input transformation methods such as :
The graph below shows the distribution of classes after data augmentation. The orange color corresponds to the addition of images to reach 500 images per class.
For our example, we chose to balance the dataset at 500 images per class by changing the zoom, brightness and contrast of some images.
Some geometrical transformations, such as rotation, are not included for this use case.
Indeed, the model must not take into account panels that could be reflected either in a window (reflection) or in a puddle (upside down panels). Hence the interest to have another model to ensure that the panels detected are not in this type of case.
It should be noted that in this case study, the perspective transformation is voluntarily not taken into account, it will be used later to test the network and verify its classification methods i.e. its ability to generalize.
For this use case, a convolutional neural network is used. It is the most suitable type of model for image classification. The network processes an input image and assigns classification scores (probabilities) to each class. The highest score corresponds to the class recognized by the model.
The chosen image is 'standard' for the network, it looks like the images of the training dataset with a learning score of 96%. Indeed, the image is a close-up, centered.
Now that our model is created and functional, we can challenge it.
To do this, we will modify the test dataset and create new images, more complex to classify, that the model has never seen.
Perspective transformation is a method of changing the properties of an image so that it appears to be taken from another point of view. The same image can thus result in many variations, which is very interesting for testing our network or enriching the dataset.