UX Design for Data Science with Wine
This article is meant to share knowledge from my experience as UX Designer working on a Data Science platform. A year ago, this complex topic was all French to me and I will try to teach my former self the savoir-faire I have now gathered. No previous background in engineering nor computing is required as I will try to democratise this technical topic in a non-intimidating and pleasant way: with wine. As they say, no great story ever started with someone eating a salad.
Introduction to Data Science
Data, like wine, is filled with stories. Data Science is the art of extracting meaning from data. It’s a way to create sense in massive amounts of raw data. We’re talking about huge quantities, to the extent that no individual person could go through to recognise patterns. The product I had the opportunity to work on is a platform to gain understanding about a set of data, build and deploy models, and make predictions. This is a highly relevant topic as it means that, in a business context for example, decision-making no longer has to be made on a whim or out of gut feeling but can be based on extremely accurate forecasts.
Personas
The end users of such product are mostly familiar with Machine Learning processes: Data Scientists, Machine Learning Engineers or Consultants. In addition, we can also include Citizen Data Scientists who have a lesser technical background in the field but can make predictions by using ready-made templates.
Data Science Workflow
Let’s put ourselves in the shoes of an end user and go through a typical end-to-end process in Data Science.
Preparing the Data
First, we need to understand the use case to solve and define the end goal. Let’s say a wine retailer would like to forecast how much the price of Bordeaux will be fluctuate in the next years to know best when to stock up their supplies. The end goal is therefore to establish precisely when the best quality will be sold for the lesser price.
We receive a dataset related to the wine domain or are given access to it. Data can be anything like text, images, tables with numbers, etc. In our case this could come from numerous sources, for example weather forecast in the region, competition from other wine producers around the world, changes in taxes and tariffs, reviews from food critics and past history of vine diseases.
To understand this dataset, we perform various statistic explorations such as finding patterns, visualising minimum and maximum values, finding a logic and looking for inconsistencies. To get a tangible feel for the data and make sure we are still on track, we can take a small section to see if the idea in theory is valid. The question we need to answer is: can I get a certain accuracy of the wine quality/price in time with the information I have?
The last data step, which can be relatively time consuming, is called pre-processing. This means cleaning the data by removing duplicated elements and replacing missing values in order to make it ready for use. When we have a satisfactory edition of the dataset, we save it into a new version. Here, we pay attention to what is important in the data (our input) to answer the question of our wine use-case (the output).
Building the Model
Depending on our level of expertise in Machine Learning, we would either create a model from scratch that works for the data version we have created or adapt one that would fit the use case.
We split the version of the dataset into two: training and testing. We feed the model with the training data, and with the unseen test data we measure if the model can find the right output by itself.
For example, we tell our model that wine from Australia gaining popularity (input) means prices of Bordeaux are decreasing (output). We ask our model: if wine from South Africa gains popularity, then what does it means for our Bordeaux?
If it’s not the right answer, we might do error analysis to find out why it’s not totally accurate or try several models to see which one gives the best result accuracy. We reiterate the execution until we reach a satisfactory score or KPI.
Finishing the Deployment
Once we have a satisfactory model, it will be delivered by being introduced into a productive system where the predictions will be consumed by the people who need the forecasts. For example, as a graph on the wine retailer’s internal dashboard so that the sales team can follow the variations of the market.
After being deployed, the model will be confronted to real data (not training neither testing) that is “alive” or ever incoming. We might be asked to track the performance of the model over time. If our model predicted that the lowest price for high quality Bordeaux would be at the end of the quarter, then at the end of the quarter we would know whether it was low enough or not. When the results metrics of our model decline, we would take the new data collected over time and retrain the model.
How can Design Improve Data Science
Designers can help by creating holistic experiences so that both experts and non-experts can work together. This collaboration is crucial as Data Scientists, Machine Learning Engineers and Consultants have the technical knowledge whereas people on the business side have the domain understanding required to target the right question to answer. The first group also needs a sufficient access to quality data, free of security issues and long authorisation processes, provided by the second. Additional value is infused through features that make data science running faster such as automating findings in the data, comparing models running in parallel, and allowing for large sharing and storage space to keep track of data versions and models.
Conclusion
As Stephen Fry puts it “wine can be a better teacher than ink”. I hope this article helps fellow designers to understand the field of Data Science better and I can only encourage them to get involved with this hard-to-understand but fascinating topic.