The importance of Humans in Machine Learning
Machine Learning is a great tool to improve user experience and automate complex processes and decision making trees. But how do we keep humans involved in the process?
At theFactor.e we strive to “make online personal”, and data is one of the most powerful tools available to achieve this goal. Introducing Machine Learning into our products allows us to create more personalized and engaging experiences for all our end-users. But this comes with a catch. While we were working on how to design these solutions we found that most design methods seem to be rooted in a single belief: “Data is king.”
In addition to having data, instead of people, as a primary motivator for making Machine Learning decisions, usually a deep business understanding is presumed by whichever methodology we looked at. As a digital agency working for multiple customers at any given time this introduces additional complexity for us to understand our customers’ business fully and it makes us more error-prone if we misunderstand some of our customers’ business intricacies. So while we may have the technical expertise to implement Machine Learning solutions on our end, we don’t have the deep understanding of our customers’ business.
Looking at this issue we recognized that we encountered this problem before. When we take on a new project, we start almost all of them with a Design Sprint. Here we work together with the most important stakeholders from our customer to figure out exactly what it is the customer wants and whether those wishes align with what their end-user wants. So could we apply this idea to Machine Learning and Data Science as well?
This is where our idea for a Data Science Sprint came into play. Our goal was to create a process where in a couple of days we, together with a customer, could figure out:
- What data is available, spread across an organization;
- What problems the customer is facing; and
- How that data can support our customers’ goals.
The important difference here is that we don’t look at the data to see whether there is a problem, but try to solve an existing problem through data. This way we make sure that we aren’t building a complex (and expensive) Machine Learning solutions to solve a non-existent problems, while still maintaining a data-driven decision process.
Where’s the user in all this?
We mentioned that we want to solve a human problem, but so far we haven’t involved any humans into the process. Since we don’t want to base our solution on data alone, it is important to involve the end-user. But how do we integrate the end-user into such an abstract process?
Having a lot of data to look at is helpful, but still its the user base that creates, and trusts us to do the right thing with their data. Therefore we need to be able to understand our user just as well as we understand the data, if we want the algorithm to make the right decisions. To do this we use an empirical approach when it comes to Data Collection during our Data Science Sprints.
This empirical approach means that all the insights derived by our Data Scientists during a Data Science Sprint will always have to be evaluated by domain experts or/and end-users before we accept them to be true. This evaluation of insights is the last phase of our Data Science Sprint. Once we understand the problem the customer is facing, and we have collected the data we think we need, our Data Scientists set out to analyze all the data and discover the patterns they believe to be relevant.
On the last day of the Data Science Sprint we then sit down with business stakeholders, domain experts and possibly even end-users to present our insights and discuss the validity and ethical impact of our findings. Based on the outcome of this discussion the decision can be made whether or not the current dataset is sufficient enough to create a reliable Machine Learning model.
When actually designing and implementing this analytical model we use a similar approach. As the technology partner we can never have the same business understanding as our customer does. While our algorithms will learn from the feedback the end-users are giving them, it is important to remove flaws before the initial deployment. This is another phase where we directly involve our customers and end-users in making decisions. Before we decide to deploy an algorithm we sit down with domain experts at our customer or with actual test-users to evaluate the correctness of the algorithm.
Why not rely on your test-set?
Those of you reading that have experience in creating Machine Learning solutions know the importance of having a reserved set of data. This reserved set allows you to test how well your created algorithm is performing. But why not simply rely on that dataset to tell you whether or not your software is working as intended? Because datasets do not feel emotions. People do.
Actual in person User Testing is very valuable to us, we even have a mobile UX Research lab named Billy. This type of user testing helps us determine the social and emotional impact of the solutions we create and we will also be applying this type of research to our Machine Learning solutions. In fact, collecting immediate feedback to the outcome of your algorithm is very helpful.
We always have the possibility to tell Siri that she misunderstood us, or correct our Autocorrect in that our company name is written theFactor.e and not The Factor.E. But how do these types of errors affect the user’s overall opinion of the tool and how do they react to the mistake? Did they laugh or get frustrated? Maybe they got offended or feel discriminated against.
For now, purely based on data, there is no reliable way to tell why someone left a website. We can’t ask them why they left, because they’re no longer on our website and analytics only tell us so much. So after we deploy our Machine Learning solutions and they adapt to our users’ input, sitting down with actual humans remains just as important, if not more important than having algorithms.