How use data engineering to process the results of a data maturity assessment

Wouter Trappers
Plumbers Of Data Science
5 min readNov 30, 2021

Please follow me on LinkedIn. This article was first published on the website of my data boutique: Xudo.

Data projects are change projects. Facilitate communication around data in your organization by designing a data maturity assessment.

On the one hand the assessment serves as a prototypical example how data projects drive value in an organization. On the other I explain how I set this up for my clients using self hosted open source software.

This is not the most technical data engineering post. I lay out the high level data flow and the way the different tools in the data pipeline work together. Actually setting up these tools in a performant and secure way is not always that straight forward, but I don’t cover that here. I hope you will still get something out of it. The links in the text point to the installation guidelines to self host these tools, if you want to go down that road.

Aligning on the meaning of data

Defining what you mean when you are talking about data can be tricky. Especially when there is a philosopher in the room. That’s why I use the following working definition of data in a business context:

Data comes to fruition when a software application captures and stores the signals that the execution of a business process generates.

Now we can use this data to analyze what is going on in the business. We can define what numbers that are relevant to follow up on our business strategy. We can define measuring points in the process to get these numbers. We can plot the evolution of these well chosen numbers in intuitive dashboards. Based on these dashboards we can take actions to refine the business process and the underlying software application. To see the impact on the numbers of these changes.

And so on. Until we have a high performing business that hits its targets, realizes its strategy and fulfills its mission.

Getting people on the same page

But first, we need to get people on board of this ambitious vision. We need a way to talk about this. That’s why I developed a data maturity assessment. When I talk to people it’s clear they have very different idea’s, aspirations and fears about data projects.

Stereo-typically you could say the IT guys focus on tooling and security. The sales guy wants something flashy. The HR people something simple. The CFO cares about total cost of ownership. And the CEO fears to miss out on the opportunities that more advanced data analytics bring.

The assessment gauges the maturity around 3 topics: data culture, data strategy and data architecture. If you like you can take it here.

Tooling and data engineering steps

In this article I focus on the data engineering that comes into play processing the results. All software applications and databases that I mention here below are opensource. They are all self-hosted on Hetzner cloud servers with Ubuntu 20.04 operation system.

Hetzner cloud
Hetzner cloud servers provisioning
  1. The business process in this case is the assessment of the data maturity. The main goal is to start a discussion around data. I wanted to keep it short and simple and not necessarily design a statistically sound questionnaire. As a software application supporting this process I used LimeSurvey.
  2. The measure points are the values of the answers I configured behind the scenes in LimeSurvey from 0 to 4. To generate insights later, I had to stick to certain question types in LimeSurvey that support assessment value. In our case an array.
limesurvey
Limesurvey question configuration

3. The database of LimeSurvey is very well suited for surveys, but not for analytics. That is why I designed a couple of views in the MySQL database of LimeSurvey to denormalise the data. The tool I used for this is PhpMyAdmin. I needed to build a couple of intermediate views before I could get all the data in the format I wanted in one table. This step took me the most time in the whole project.

SQL for the view in Limesurvey

4. Now we can connect Metabase to the database and use the view we created there to visualize the data. When we were doing this we noticed that the answer_value was a string so we adjusted our view to cast it as an integer. Now we could use the answer_value in aggregations, like the average answer per topic.

metabase
Metabase dashboard

5. Metabase allows to filter the different visualizations based on one of the fields in the data. We can also see how CEO or CFO’s of organizations between 2 and 10 employees think about the data maturity. But this allows us to quickly provide feedback to the participants of the survey by filtering on their name.

If more colleagues of the same organization take the survey, we can filter here on the different departments. This facilitates a discussion on possible differences in maturity between departments.

Personal experience and learnings

In summary I deployed Limesurvey, developed a data maturity assessment in Limesurvey, denormalized the data to prepare it for visualization with phpmysql as an interface to a MySQL database and built a dynamic dashboard in Metabase. The steps that took the most time, was to develop the actual survey and to denormalize the data.

I had some previous knowledge on hosting open source tools in the cloud. This is not always as straightforward as it could be, but the tools I use are rather widespread so for the most issues that I encountered, I found documentation on how to solve them. From an installation standpoint the complexity was adjusting the config files to allow access to the MySQL database of Limesurvey from phpmysql and Metabase, that are on different servers.

The maturity assessment is freely available to everyone who is interested in filling it in. And now that I have this survey and data pipeline in place, I can very quickly personalize and distribute it to specific organizations and teams to get a more granular understanding of their data maturity.

--

--

Wouter Trappers
Plumbers Of Data Science

Data consultant with more than 15 years of experience in data and analytics with a specialization in business intelligence. Background in philosophy.