Machine Learning: realistic expectations

A practical way to approach a Machine Learning project

Anxo Rey Masid

Published in

SecuritasDirect

7 min readJun 18, 2021

Introduction

We’ve all heard of Machine Learning and we have a slight idea of the advantages it provides or the problems it can help us solve, but do we make proper use of this “tool” or do we pervert its use sometimes?

Most of the time, these types of projects emerge as an imperative solution to an existing problem or proposal but we don’t always notice whether this is, perhaps, the answer that best fits that question. Nothing happens if we have time to spare, but it will become an inconvenience if we are part of a Data Science team of a company and this is one of the many projects in which we are enrolled (time, always time …).

On the other hand, on many occasions we might approach models that, once finished, do not fulfill the expectations prior to their development or end up not fully answering the question that gave rise to their existence. When this happens they end up in the garbage bin without having ever been implemented, thus wasting time and discouraging its creators.

The aforementioned situations are more common than we think and, in many cases, could have been avoided with a preliminary prospecting phase, that allows us to explore and properly manage customer expectations.

In essence, the pursued goal is simple and very important: to give the most appropriate answer to the question that is being asked. It will be understood quite well with a basic example:

Photo by LinkedIn Sales Solutions on Unsplash

If someone asked us how much we measure, we would possibly choose to provide them with the last data we remember:
— 1,78m — my friend Juan would say.
But if we were a little more purist, we could ask Juan to measure himself again, since our height can change over the years, although in a practically imperceptible proportion to the human eye:
— You’re right, I’m 1,75m tall — Juan would say again after measuring himself with a measuring tape.
Bordering on the absurd, we could ask Juan to carry out this measurement with a high precision laser in order to make it a little more precise:
— 1,774679m — Juan would exhaustedly reply.

The three answers in this example serve to illustrate that, when faced with the same question, we can carry out different exercises, with their corresponding methodologies and associated costs, to answer the initial question. If we do not know the purpose and intent of the question, we could be measuring Juan with a high-precision laser when the answer that he himself could give us would be quite sufficient and efficient for all intents and purposes.

It is, therefore, especially important to carry out a prior “thinking” exercise and begin the project with an exploration and prospection phase. With it, we’ll seek to understand what solutions we can provide, which is the one that best suits our budget and context, and how it should be addressed if it is launched. To do this, we must take into account the following points before and during the development of our projects:

the problem
the resources
the solutions
the expectations

The problem

The first vital thing we need to do is to understand the problem that has been presented to us in-depth; immediately after we should start the abstraction process to understand (in a practical way) “what exactly” are the needs of the person who presents the problem. It is relatively easy to make the mistake of losing sight of this point and overly focus on the technical answer: we might seek to innovate and achieve a high-quality solution, one that may not fully adapt to the original question(something like worrying about getting a laser to measure Juan when the question “how tall are you?” is asked in an informal conversation).

Making the idea of a solution more tangible by the person asking the initial

Giving a tangible answer to the person asking the initial question can help us to better shape our approach: it is not the same if someone has a specific question about their clients than wanting to optimize a company process while trying to distinguish which ones are more prone to drive a certain purchase action.

The resources

The most important limiting reagent when developing the solution are the resources we have: time, budget, technologies, etc. We are going to highlight the most relevant and those that may be more common within most projects of this type.

On the one hand we have all the immaterial resources such as the people involved in the project and the budget (strictly linked to people and their time commitment) that we have. It is important to make a rough assessment of the costs that each of the solutions could entail, that way we’ll be able to tell if any of them are beyond our limits, which might mean that the project could remain unfinished. In some way, materializing the chosen solution and the project path as much as possible can help us get close to more approximate numbers that will allow us to understand if we are within our margins or not. If not, we will have to rebuild our solution, adapt it, or, in extreme cases, cancel the project.

On the other hand, we have all kinds of technical restrictions that may arise, such as:

Work environment: certain developments, queries, or processes require a suitable work environment.
Tools: certain technologies may not exist within our company and require some type of internal licensing or approval that has not yet occurred (we could not use Spark if there is no cluster in which to launch the processes).
Data access: both in terms of availability and privacy, we need to guarantee access to the data required to develop the project.
Integration with other systems: understanding the fit of the new “part” that we are going to create is especially important within the company for its proper exploitation (it would be useless to have a model with great results if is not exploitable by the rest of the company).

The solutions

Leaving aside the cost of each of the solutions that we can assess, it is very important to take into account what we mentioned at the beginning with the example: that is that the asked question is adequately answered and that it properly serves its purpose. Within the range of possibilities we have, that can go from extracting a table or calculating aggregated data through a query on a database, to creating a report with a BI tool (Power BI, Tableau, etc.), doing a more exhaustive analysis to draw out some conclusions, or, finally, choosing to build a Machine Learning model. With this I do not mean that we should avoid the construction and development of a model at all costs, but rather that if we create it, it should be because it is the solution that our problem demands and all those simpler and cheaper alternatives don’t fulfill our needs.

It would make sense for an iterative process to take place on the same issue, always going from the simplest to the most complex solution; even, starting with the construction of a relatively simple model (for example, a linear regression) to a much more refined one that provides us with better results (for example, a hidden multilayer neural network). The important point is that each iteration should be marked by the ambition of our client.

The expectations

Given that right now Machine Learning is at the center of many conversations, the expectations deposited in this type of project tend to face two diametrically opposite scenarios: those who do not know the technology and its ins and outs seem suspicious (low expectations) or “overconfident” ( high expectations), and those who have investigated, are familiar with the concepts and are well aware of their limits (realistic expectations).

Although it is not entirely simple, it is possible to deeply investigate the limits that may exist for the model in terms of performance (to approximate the Bayes error to the maximum for that problem). In other words, it is particularly relevant to understand the problem well and determine possible solutions, as well as their scope and precision. Although it seems obvious, the person who has transferred the problem to us does not have to know if there is any limit from a technical point of view to solve it and / or if the solution does not meet expectations, it will always be easier to handle the situation if prior to developing the model we have made these conditions patent.

Our job in solving the problem will be to achieve the best possible result, but if we know the limits of that result, it is good to inform the person who has made the request since it should be this person who, ultimately, fires the gun to start the development.

Conclusion

With all this, the only thing I want is to give you some guidelines based on my subjective personal experience that serve, in some way, to prevent cumbersome situations and adapt in the best possible way the solutions to the problems that arise within this exciting world which is Machine Learning.

Many thanks and see ya in the pit!