I often get questions about how systems based on AI work, how it learns and how it knows which decisions to make.

InFakt has embedded artificial intelligence into arbitrue to help make decisions about assigning accounting documents to the appropriate category. I often get questions about how systems based on AI work, how it learns and how it knows which decisions to make. In this post I want to briefly illustrate how it works and the process it uses.

The task we set: To determine the probability that a given car will break down in the next year. We want to have a system that will tell us what the chances are after some information about the car is provided.

To build such a system we have to have a database numbering in the thousands with the failure rate of various cars.

First stage

Let’s say that we have a database containing only information about when the car was made and if it broke down in the last year. We feed this information into the program like this:

We input information like this about large numbers of cars, measured in the thousands.

The data given to the system is analyzed by the algorithm, which is an AI-based process. It will be easy to conclude if there is a relationship between the production date of a car (its age) and its failure rate. So theoretically our program works since we get results after entering the date of production. However, trying to determine how likely a given car is to break down in the coming year based only on its production year is problematic and unlikely to be very precise.

For better results, we need to supply the system with more data.

Second stage — adding car manufacturer

We input information like this about large numbers of cars, measured in the thousands.

Now, the algorithm begins to analyze not only the date of production but the brand of the car and what influence it has on the car’s dependability. The analysis will result in different probabilities assigned to each brand to represent how likely their cars are to malfunction. Adding another layer of information increases the precision of predictions about a given car’s reliability.

We could just stop here and leave the model like this, operating on the basis of information about the age and manufacturer of each car. But let’s keep going and add yet another variable to the mix to see if we can get even better results.

The next factor we’ll include is the color of the car.

Third stage — color

We input information like this about large numbers of cars, measured in the thousands.

The system now evaluates the role of three variables in determining the failure rate of the cars in the database: year produced, brand and color. Our task is to verify that adding the third parameter increased the quality of the results. In this case, the data we get from the system won’t be better and could even be worse because obviously a car’s color has nothing to do with its mechanical performance.

If the quality of the results produced by the system is not enhanced by the addition of another parameter, it’s better to delete it and replace it with something else.

Fourth stage — exchanging color for number for distance driven

We input information like this about large numbers of cars, measured in the thousands.

It’s highly likely that the system will create better, more accurate results on the basis of these variables.

Verifying results

It’s important to verify the effects produced by adding information at each stage in the process described above. What’s the best way to do this?

First, you must have credible data. For example, if I have information about 100,000 cars that broke down in the last year, I will give 90% of it to the AI system to help it learn and keep the remaining 10% to verify the accuracy of the decisions it makes.

Quality and quantity of data

The best results are achieved through the biggest data samples. The amount of data you need to enhance the accuracy of the results increases with the number of parameters you apply. It’s also important that the data you use be accurate and diversified. In the example above, we would get poor quality results if we only used, for example, cars of the same brand and of similar ages.

Well-chosen data

As the example above shows, it’s important to select data on the basis of how it can help Artificial Intelligence to make better decisions and produce more accurate results. In my illustration, it’s clear that color has no influence on a car’s reliability. If happens, though, that we don’t know what kind of data is the most useful and will have the best positive effect on the results we get. For example, it’s hard to say what effect the age of the driver has on the proper functioning of the car in isolation from other factors. That’s why it’s important to test multiple solutions.

Using AI

The most common current use of AI to categorize data (as in the example where we categorized cars according to their failure rate) and to identify photographs, images and writing.

Artificial Intelligence in arbitrue

In arbitrue, we put Artificial Intelligence to work in the field of accounting. We recognize that many accounting tasks can be automated with a degree of accuracy that is at least as high as that of humans. We use collected data to teach our system whether or not a given document should be processed and if yes then into which category.