birdie.ai
Published in

birdie.ai

Data Exploration and the Agile Mindset

How do we deliver meaningful insights without clear goals in mind?

Photo by Daria Nepriakhina on Unsplash

Picture this scenario: you, as a Data Scientist, receive a task that can be summarized in a single post-it. It has no clear end, no criteria of validation, and a short deadline:

Explore this dataset and figure out what we can extract from it. Due next sprint.

As a programmer, my first instinct is to grab some torches to burn this heretic nincompoop of a project. Agile methods usually require tasks in bite-sized portions, well-defined definitions of done and clear-cut metrics for objective metric measurement. Exploration tasks rarely have those.

Photo by Zab Consulting on Unsplash

The degree of freedom that this kind of task gives you can be a little threatening. You are dumped in a sea of information and need to find a single fish in the throes of the data lake.

And, unlike most Data Science tasks, you are left with no comparable metrics or feedback. How do you know what is a fish and if it is tasty compared to others if you never had one?

This is completely different from your run-of-the-mill stories that involve training a model to achieve an increase of a certain KPI or a better result than prior classifiers. You can adapt Scrum sprints to fit your training/validation/deployment process, change deliverables to experiment reports. There are many great articles in this exact matter, and I will defer to these here:

You can approach it with the heart of a researcher, manually inspecting your instances, creating visualizations and finding correlations between variables. This is the most fundamental building block of a successful Data Analyst, and a great step in order to understand your data.

But that's just the first step. The tricky parts are: how can you verify the correctness and impact of your analysis and subsequently prove their usefulness.

When you have prior metrics for comparison, your task is as simple as creating a bigger/smaller number with your models and calculations: increase classification accuracy; increase recall for this specific class; decrease MAE for that object detection method.

But what happens when you don't have that golden standard? Either:

  • You create the golden standard with manual labeling or;
  • You propose a standard based on real-life observations and expert opinions.

Creating a golden standard is relatively easy when the problem has a parallel task in academia: a domain-specific sentiment analysis model just needs that specific dataset to validate it, replicating the metrics used on other works.

When you do something that has no parallel, that validation must come from expert sources, internal or external. Let's illustrate this with a live example here in our company: washing machine reliability.

When you think about product reliability, there are a bunch of factors that can influence customer opinion: overall satisfaction with the product performance and features, customer support quality, maintenance availability and costs…

We proposed scores that represented the customer opinion on each field based on user reviews from products of each main brand on the market. The process involves a series of NLP models that extract, analyze and aggregate meaningful information from textual data.

Unfortunately, there wasn't a prior metric we could directly compare our results to. This is not a single model being trained, it's the result of an extensive analysis that needed large amounts of data to be statistically relevant. So, no real internal objective metric was on the table.

That leaves us with a subjective problem on our hands. We needed to prove our hypothesis (this metric measures something interesting) and compare it to reality (this metric is in line with these past events).

Proving the hypothesis is an example of Data Science storytelling: grab a small portion of your dataset, show detailed results and scores for each instance and compare classes. In our case, we analyzed the scores obtained from selected reviews that represented good and bad viewpoints about a specific feature. That checked out!

Comparing it to reality demands a little bit of digging and expert knowledge. In our case, there are plenty of agency reports about the reliability and customer satisfaction of Washing Machine owners. Sadly, most of those reports indicate varied results: Samsung, LG, Whirlpool, and Maytag fluctuate wildly in all analyzed rankings. We had a result that was more in line with actual consumer reviews instead of paid surveys, and we couldn’t compare it to anything in the market. So, to validate our scores, we sought what actual experts on Washing Machine servicing to tell us if what we found was in line with the truth. That also checked out wonderfully!

Now that we have a number, how do we sell it?

You dashboard it, of course. We’re data sciencing here.

This specific metric can be compared to other social listening tools, but it is not quite the same since the data we collect is specific to consumer reviews (and can be even filtered to only take verified reviewer opinions). Most platforms are doing some kind of analysis internally and only take into account reviews from their own sources. Amazon and Bestbuy are great examples: product aspects are being mined and displayed over the reviews to simplify user journeys.

Source. Check the “Read reviews that mention” section.
Source. Check the “Reviews” section.

Now, with our data, we could emulate these features for a consumer-centric approach, or we can go straight to the brands to connect them with what people are saying about their products. We chose the latter, as we can answer some industry-related questions such as:

  • Did my consumer support sector improve in the last 3 months?
  • Are people satisfied with my prices? Is there a retailer that excels in this field?
  • Is customer support dissatisfaction related to delivery services or repair services?
  • Are features recently introduced in our products being cited positively or negatively?
  • What are my scores compared to my direct rivals?

But how to prove that the answers we provide are interesting to anyone? The answer is something that pretty much anyone working in Agile knows by heart: demos. An inordinate amount of demos. We exposed early versions of metric dashboards to many different brands and marketing agencies, absorbing every feedback to sculpt numbers and graphics. Step by step we proved the worthiness of our data fish, and people think it is delicious.

We are currently developing an extensive dashboard with this metric in mind, along with many others that have risen from our data collection and analysis pipelines. This is just a small part of the massive framework we maintain to absorb and digest review information. We hope to share more and more of our development practices and methodology, and if you have any further questions about what exactly we can give you, feel free to contact us!

Have fun in our Data-driven world!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store