Data — is oxygen for ML

We’re used to putting computers to work to help complete tasks with greater efficiency. For most of their existence, they’ve been thought of as static machines, mechanical assistants that simply respond to inputs instead of responsive, adaptive advices. After all, they can’t think like we can, right? Well, maybe. It all depends on the definition and interpretation of what it means to “think” since it’s simply a fact that computers lack human analytical skills. However, the rise of machine learning has forced us to reconsider what it means to “think” and take another look at two almost rhetorical questions:

  • What computers can do better than humans?
  • What humans can do better than computers?

Computers outperform humans when it comes to executing the most complicated operations while critical thinking is the domain of human beings.

Combining these two realities, Arthur Samuel defined Machine Learning in 1959 as ‘’the field of study that gives computers the ability to learn without being explicitly programmed”

There’s a lot of buzz around Machine Learning (ML) these days and much of it is optimism about its future applications but the truth is that it’s already here. Different forms of Machine Learning have appeared in familiar contexts over the last two decades. Just think about autopilot functions and self-driving cars. Recent advances by Tesla and their innovations in this field come to mind. But if you do some research on this topic, you will find that a working prototype of an autonomous driving vehicle was presented in 1992. Project ALVIN at Stanford University used a simple perceptron concept to learn road conditions by observing a human driver and learning from the actions taken.

In addition to this, many contemporary products use machine learning in more advanced forms. Think of translation apps like Google Translator, advanced medical technologies that help to diagnose diseases like IBM Watson or smart algorithms to show you people that you may know on Facebook.

Companies and products, using ML

Machine learning is used in many software projects. The are two main ways of using machine learning, according to Google product executive Aparna Chennapragada. She says that ML can give a turbo boost to existing products through new features and opportunities for building solutions but it can also unlock new possibilities and be the spark for ideas for new products, fields and markets.


Every technology is has its drawbacks. One of the most challenging is the fact that we don’t always know what is going on with our system. Machine Learning modules are very often implemented as BlackBoxes — we feed data into them and we get some information in return. Sometimes it is exactly what we want but sometimes it doesn’t work out. The problem is that we rarely have the opportunity to examine our BlackBox to find out what the problem is.

Usage Model

There is common model of using machine learning.

ML usage model

Each part of this chain is crucially important since without the correct algorithm we would not get any results. Also, for all this to work, we need some data. Proper usage of data can be very helpful, as illustrated by these examples.

  1. Red Roof Inn
    During periods of bad weather, business at Red Roof Inn was down. But then they came upon the idea of putting data to use. When flights were cancelled due to extreme weather conditions, up 90,000 passengers were suddenly stranded. Analysts crunched the numbers and were able to identify the right times and places to promote the company and offer discounts. Put the right data in the right model and the machine can do the rest.
    The proper use of weather forecasts and data from airports allowed Red Roof Inn to target their offers and resulted in a 10% rise in revenue. This means that data not only helped avoid losses in a tough part of the year for the industry but actually brought in an increase.
  2. Los Angeles Police Department
    Another example of benefits obtained through the smart use of data is the LAPD. Their story is a perfect illustration of the classic model; collect data, plug it into an algorithm and let the machine do the work. Since the LAPD had lots of historical data about crimes and the details involved, they hired a company to analyse the data. The analysis identified times and places with an increased risk of criminal activity, which made it possible to assign extra resources to help prevent crimes. The results were impressive, with a 33% decrease in thefts and 21% fewer victims of various crimes.
  3. UPS Cargo delivery.
    UPS is a global leader in the delivery and transport industry, operating in about 195 countries around the globe and delivering approximately 17 million pieces of cargo daily. UPS uses an analytical system called ORION for optimisation and operation research. They invested in huge real time processing power and are now enjoying the results. Thanks to the efficiencies and brought by the system, UPS has been able to use 6 million fewer liters of fuel during the year, speed up its deliveries and reduce toxic exhausts into the environment by 13,000 tons.

These examples show different approaches to collecting and using data. They also illustrate how data is all around us and often easy to access. It can be publicly available data, historic data or data produced in real time. And not only is it widely available, it can be the key to unlocking a number of business challenges.

This is the approach we take at InFakt. Our software is used to process over 50,000 invoices a month. This produces a lot of data and the decisions that the program has learned to make have been applied to Automatic Accounting. Our goal is to help accountants focus on important issues that demand their attention rather than wasting their time on monotonous tasks that can be automated. This is what automated booking is all about — allowing human accountants apply their expertise to more demanding work and letting machines do the rest. Automated Accounting makes decisions on tax and insurance documents based on extensive experience and dealing with identical situations many, many times before. In fact, the program processes documents more efficiently than human accountants ever could. Over time, we have refined the Automated Accounting program and rebuilt the algorithms as needed based on analyses of the data that it constantly processes. The infrastructure built around the algorithm has also been enhanced to help with automated accounting to the point where now more than 55% of invoices are processed automatically with 97% accuracy. When we started looking into this 3% remainder, we quickly discovered that this gap was due to decisions that were not wrong, but inconsistent with actions taken by human accountants. Among the “inaccurate” decisions, eight out of ten were actually correct, with the errors on the human side of the equation. This drive for constant improvement and a dedication to the highest possible rates of accuracy has produced an Automated Accounting program that can process more than half of all invoices automatically, saving more than 600 hours of manpower a month for our users.