Training improvement

Improving the training set will save a lot of time and effort to reach the needed reliability for your AI business cases.

Improve your training process

The training of your neural network relies on a training database. The performance of the network will depend on the quality of this database. It is not enough to check its quality by the mean of statistical methods. Now, you can also understand how the neural network will learn from this database.

Saimple helps you to check what a network has learnt and how to improve its learning. Right from the start you have a better control of the learning that is done and you save time in building a solid learning training set.

1- Visualize what your AI has actually learned

Testing is about measuring performance, but knowing what your neural network has actually learnt gives you why it is performant or not. The earlier you have this information the more time you save downstream.

Training can be time-consuming and the only way to find out if it was successful is to measure the accuracy on a test dataset. But nothing prevents the AI to have good accuracy and be entirely biased. To prevent bias you need to know more precisely what the neural network has actually learnt.

With Saimple you can visualize the impact of every input on the decision made by the network. By doing so it is possible to know how the network is impacted by every input at each step of the inference. You can discover unintended behavior as well as knowledge on how to improve both your training dataset and your neural network architecture.

2- Prevent the risk of a failing reliability

Classical measures of performance are often not enough to have a clear understanding of how the system will operate in real conditions. You need to go beyond the data you know and discover how much your system will be robust. Systems have in their specification a domain of use that is intended and on which the product is supposed to reliably operate. But to cover the domain you only rely on the data you know to do the testing. Therefore if your data does not cover enough ground, you might miss something within your domain of use.

To cover more ground and ensure a better reliability of the system you need to test not only on isolated inputs but also whole input domains. You do not need to go through each straw in every haystack to check if there is one example that does not work. With Saimple you can ensure that each stack is correct right away.

3- Improve your training dataset

Constituting a good training dataset is crucial for the performance of your neural network. The more data the better is not sufficient, to have good results you need good data! Improving your training dataset will improve the quality of your product and saves you a lot of trouble with the training process.

Knowing how biaised or unbalanced data diversity is impacting your training is a first step. But then you need to have correcting actions onto the dataset. For that your data scientists need guidance on how to adapt efficiently to your large scale database.

Saimple is not only discovering issues it can also help you correct them by identifying:

  • What part of the data is causing bias;
  • Which data labels are being confused;
  • Which data labels are not robust enough and require more data, and more.

By knowing in which direction you go, you can improve the training dataset will help you data scientists. They will be avoiding unnecessary correction steps later in the process.

4. Bias in machine learning

According to Telus International, there are 7 distinct bias in machine learning :

- Sample bias

Sample bias can occur when the training dataset does not reflect the reality or the environment in witch the model will perform. A good example of this kind of bias is the following : lets consider a model doing facial recognition,  witch would have been trained on only images of white men ; this model will contain bias because the dataset does not represent women or other ethnic group. 

- Exclusion bias

Exclusion bias can appear when we delete some information from the dataset thinking they are useless. 

- Measurement bias

During the acquisition of data (when the dataset is built), using the same support or a certain type of storage system can insert a measurement bias.
For instance, if the same camera is used for taking pictures, and if this camera insert a watermark on each picture, a measurement bias can occur.

- Recall bias

Recall bias can occur when data from a dataset are not consistently labelled. It most of the time occurs during the labeling stage, when two classes are really similar or when the labeling is carried out by different persons.



 

- Observer bias

This bias can appear when the scientist (observer) dealing with the data have pre-assumptions on those data. The observer will then see what he want to see in the data, or over-interpreting it.

- Racial bias

Excluding a category belonging to one specific class can induce racial bias. If we once again take the example of the facial recognition, the model who have learned only on white men images won't be able to recognize black men for instance.

- Association bias

This bias occur when the model takes into account some features that are common to all the samples of a class, but wich does not represent the class. For example, a model that should recognize men and women and whose training dataset contains only images of men with short hair and women with long hair, would  have a hard time classifying images of men with long hair (conversely images of women with short hair).

Numalis

We are a French innovative software editor company providing tools and services to make your neural networks reliable and explainable.

Contact us

Follow us