Neural network validation

Among the current challenges facing the automotive industry, demonstrating the reliability of AI embedded systems in autonomous cars is paramount.

Saimple: Ensuring safety in sign classification

The autonomous vehicle market is estimated to be worth more than 500 billion euros by 2035. However, the development of this market is strongly linked to the overcoming of technological, material, ethical and societal constraints. Thus, many questions around security arise.

A 2019 study on the perception of 51 countries regarding autonomous cars showed that about 44% of respondents believe that self-driving vehicles are safe. For 39%, they will be safe enough to use in less than 5 years {1}.

One of the challenges is to demonstrate the safety of the AI systems on board. But how to guarantee the reliability of the models? What criteria allowed the AI to make its decisions?

This article aims at answering these questions by presenting, through concrete cases, the functionalities of the Saimple tool to ensure the confidence of the models. 

1 Traffic sign detection

1.1 Objective

An autonomous vehicle is a vehicle designed to drive without the intervention of a driver. There are five levels of autonomy, ranging from driver assistance to full autonomy {2}. Today, most recent vehicles have some degree of driving automation, such as parking and emergency braking assistance. These devices use sensors contained in the vehicle collecting data, largely in the form of images.

But before producing a fully autonomous vehicle, it is necessary to have a trusted, robust device that meets current safety standards. As announced, several challenges regarding the safe operation of such vehicles remain.

For example, how can we guarantee that the performance of the driving system will be maintained in the event of a heavy snowfall, which could disrupt the vehicle's sensors? The number of cases to consider is almost infinite and it is impossible to test them all exhaustively. It is therefore necessary to design AI systems that can be robust by construction.

To help with this process, Numalis has developed Saimple, a tool that helps understand why a model (in this case, a neural network) has made each of its decisions. It also allows to consider different types of noise in the images to measure the robustness of the model to the perturbations it may face. Thus, Saimple will help validate the system and increase the confidence that the user can have in it.

1.2 Description of the dataset

The Use Case deals with traffic sign recognition which is part of the classical framework for the development of driverless vehicles.

The dataset contains more than 50,000 images. It consists of 43 distinct classes representing different types of German traffic signs:

  • The training set contains 34,799 images;
  • The validation set contains 4,410 images;
  • The test set contains 12,630 images.

These sign images are very varied. Indeed, in the same class, it is possible to have the same image but with variations of parameters: luminosity, contrast, colorization and resolution. Moreover, they are of various sizes, ranging from 30x30 to 140x140 pixels. For the most part, the 30x30 image sizes, known as small, are explained by the fact that they are intended to be analyzed by on-board networks in vehicles.

In spite of the resolution of the current cameras, such a choice of resolution is explained by the fact that the signs must be detected as far as possible by the vehicle so that its driving adapts to it as soon as possible.

Thus, it is necessary to be able to classify images of road signs, even with small resolutions, in order to better anticipate the vehicle's decision. For example, in the case of a road sign representing a "stop" sign, the vehicle must be able to stop at the line and thus start its braking well before in order not to brake suddenly just in front.

The GTRSB dataset used is referenced below: https://benchmark.ini.rub.de/?section=gtsrb&subsection=datas

1.2.1 Visualization of the dataset

Here are four images from class 0

 

Here are four images from class 21

1.2.2 Class proportions

As mentioned before the GTSRB dataset is very rich, it contains about 50,000 images and more than 40 classes. Nevertheless, the dataset is not perfectly balanced. As some traffic signs are rarer than others, it is normal to have classes that do not have the same number of images. However, a network trained on an unbalanced dataset may have weaknesses in recognizing some classes that are too sparsely represented compared to other classes.

1.2.3 Data augmentation

Data augmentation can be used to rebalance classes. It is a technique for improving the dataset. It consists in using a sample of images of under-represented classes to generate new images and thus balance the dataset.

After performing the data augmentation and re-training the network, we can then re-examine the network with Saimple to observe if the data augmentation was effective, to show the impact of the data augmentation on the model and to determine if it made the network more robust. To perform a data augmentation, it is possible to use many input transformation methods such as :

  • Resize the image: enlarge or shrink the image;
  • Fill in pixels (i.e. fill in missing spaces in the resized image);
  • Change the perspective of the image.

The graph below shows the distribution of classes after data augmentation. The orange color corresponds to the addition of images to reach 500 images per class.

For our example, we chose to balance the dataset at 500 images per class by changing the zoom, brightness and contrast of some images.

Some geometrical transformations, such as rotation, are not included for this use case.

Indeed, the model must not take into account panels that could be reflected either in a window (reflection) or in a puddle (upside down panels). Hence the interest to have another model to ensure that the panels detected are not in this type of case.

It should be noted that in this case study, the perspective transformation is voluntarily not taken into account, it will be used later to test the network and verify its classification methods i.e. its ability to generalize.

1.3 Explanation of the model

For this use case, a convolutional neural network is used. It is the most suitable type of model for image classification. The network processes an input image and assigns classification scores (probabilities) to each class. The highest score corresponds to the class recognized by the model.

1.4 Model result

The chosen image is 'standard' for the network, it looks like the images of the training dataset with a learning score of 96%. Indeed, the image is a close-up, centered.

Now that our model is created and functional, we can challenge it.

To do this, we will modify the test dataset and create new images, more complex to classify, that the model has never seen.

1.5 Challenge the network

Perspective transformation is a method of changing the properties of an image so that it appears to be taken from another point of view. The same image can thus result in many variations, which is very interesting for testing our network or enriching the dataset.

In the following two examples, a perspective change on the image is applied to differentiate it from our training dataset. As the network is not trained on such images, it is more likely to make mistakes and show signs of weakness.

Example of image transformation allowing to enrich the dataset, or here, to test the network starting from well classified images.

The model has well classified the image representing a 30km/h speed limit sign. So let's take another image with a different perspective for our model.

For this "bottom view" image the network made a mistake in the classification. It confused with the 50km/h speed limit sign. This decision can have impacts on road safety.

Comparison of the misclassified image and the image of a 50km/h sign in the dataset:

But why did the model get it wrong?

We notice that the image seen from below is more stretched upwards so the numbers are less centered compared to the well classified images.

It seems that the model bases its decision mainly on the pixels located in the center of the image. Therefore, a simple shift up or down could lead to a misclassification of the model. This is the hypothesis that we will try to verify in the following.

And now, how can we verify this hypothesis?

1.6 Use of Saimple

Our trained network provides some classification results, but it is difficult to interpret this result or even to support a hypothesis regarding the explanation of a decision. The Saimple tool intervenes precisely in this context. Indeed, the Saimple analysis makes it possible to extract elements of explicability and interpretability, and thus to validate the assumptions made. In this way, vulnerabilities are identified and it is then possible to improve the robustness of the models and accelerate the training phases of the networks.

1.7 Relevance Analysis

Saimple also allows to identify the important pixels that allowed the model to classify the image. For each pixel, it is possible to associate a relevance score which allows to determine the impact of this pixel on the output decision. A pixel is said to be important when a value called relevance is higher than the average. The more important this value is, the more the pixel is colored in red or blue, depending on whether the relevance value is positive or negative (the pixel has a positive influence on the recognition of the class or, on the contrary, on the decrease of the score of this class). This relevance score is computed using formal methods that rely on the behavior of the network on the starting image but also on the "neighboring" images (i.e. with more or less alteration of its pixels) of the starting image.

On the example below, the red pixels of the relevance correspond to the pixels of the input image that have positive effects on the classification of the input as belonging to the "Speed limit (30km/h)" class. As for the blue pixels of the relevance, they are the pixels of the input image that have negative effects in the classification of the input as belonging to the class "Speed limit (30km/h)".

 

Correct classification

The image, below, has been transformed by changing the perspective and has been correctly classified. It is possible to notice the zones privileged by the network for its decision making. Indeed, the red pixels can be interpreted as the search for a digit pattern.

Incorrect classification

For the image below, we had hypothesized the origin of the misclassification. The only difference with the other well classified images was the position of the digits in the image. We then assumed that the model based its decisions on the pixels located in the center of the image. However, without any means to validate our hypothesis, it would have remained a guess.

Thanks to the result produced by Saimple, this hypothesis seems to be verified. Indeed, the relevance shows that the model analyzes in priority the pixels in the center of the image, where it is used to identify the panel features.

This bias identified by Saimple can, subsequently, lead the data scientist to adapt his training dataset.

Conclusion

Image classification is nowadays omnipresent in the world of artificial intelligence, and is a key step in the design of some autonomous embedded systems such as the driverless car. This type of vehicle bases its decision making on the different input data provided by its numerous sensors (camera, lidar...). To interpret these inputs, the system uses several types of AI that complement each other (detectors, classifiers, ...). Each network has its own specialty (classification of vehicle images, roads, traffic signs, ...) and interacts with other systems (detection of signs, vehicles, ...).

As mentioned in the introduction of this use case, the main objective is to verify that the classification made by our model will be identical regardless of the quality of the information it processes. A low resolution image of a shaded panel should be classified as well as a high resolution image of a well visible panel. The notion of robustness to certain noises comes into play in this case. These noises can be characterized by changes in brightness, perspective, image quality or the size of the element in the image (the distance of the panel for example). It is therefore essential to ensure that each AI interprets the information properly and is robust enough to be put into service.

After selecting and preparing his dataset, the data scientist creates or retrieves a model adapted to his needs to train it on. Once confronted with the results of his network, it is difficult to identify the relevant elements of the input data. The classical statistical metrics of accuracy and loss are insufficient to evaluate the robustness of a network in a more general way. Some elements relevant to humans will be discarded by the AI during classification and vice versa. Saimple allows to visualize these elements, to understand how the network classifies, but also to analyze the robustness of the obtained classification.

Numalis

We are a French innovative software editor company providing tools and services to make your neural networks reliable and explainable.

Contact us

Follow us