How we implemented the BI platform and proceeded to develop self-service analytics capabilities

Business Intelligence solutions have been an important part of developing self-service analytics capabilities since I joined the company as a BI Engineering Manager for inDriver in 2020. We have made significant progress but still have much to overcome going forward. Read on to learn how we have tackled existing issues and our plans moving forward.

Dimitriy Sherbenko
inDrive.Tech
7 min readApr 5, 2022

--

How it all began

Our first step was to understand what the company was already using to build dashboards and automate the reporting process. At that point in time, it was a custom-coded service, which allowed users to select and download reports by setting their parameters in the interface. Grafana was used to visualize the data and build dashboards.

After some careful considerations and discussions with users, it was decided to keep both services at that juncture and implement the BI platform in “competition” with them. There were several reasons for this:

  1. The current amount of data at the time included dozens of reports and dashboards in use.
  2. The growing need of users for new reporting solutions, which exceeded the capabilities of Grafana features in terms of their functionality.
  3. An almost complete lack of documentation on current data collection procedures for reports and dashboards and ways in which related metrics are calculated.

Now with an entire team of BI engineers behind you, you can seek help to encourage data engineers to develop and maintain your data marts, and I would be all for moving all the dashboards from Grafana to Tableau. That was our choice and forgot about Grafana once and for all.

I’ll be honest here that in my opinion, competing with a tool or service is not as difficult as competing with a deeply ingrained human habit. However, at that time, my team was only me. Therefore, I had to travel down this thorny path, which, in addition to the pain I suffered, gave me a chance to gain valuable experience about the choice of the user. More about that later.

Choosing a tool

As I mentioned earlier, I chose Tableau as my BI tool. It is one of the market leaders for several reasons. Most importantly, it’s convenient, flexible, and abundant in useful features. It was perfect for the task of building multifunctional dashboards for multiple business teams, as well as product and development strategies for self-service analytics solutions across the company.

The alternative was to embark on a fruitless “holy war” for hours and hours debating the issue as to why Tableau was a better solution. I’ve been using it for most of my working life, and Tableau has proven to be a reliable solution. I felt confident it would be up to the task, and I wasn’t wrong.

At that time, the company’s data volume was already, to put it mildly, not inconsiderable. Millions of users generated an equal amount of trips, and the data had to be stored and analyzed somewhere. Clickhouse was chosen as the optimal solution for reading and inserting data in the absence of resources for administering, deploying, and maintaining a Hadoop Cluster.

We were running it on a single fat server. Data from the backend and a couple of other third-party sources was loaded onto the server on a daily basis. The same server was used by analysts who wrote their queries to the same database. Such shared use was not a problem then, but that was then.

Apache Airflow was chosen to orchestrate the ETL processes, most of which were updates to the reporting data marts. As a backup capability, PostgreSQL was used there as well in case we came up against the available functionality of Tableau + Clickhouse being inadequate.

For example, working with spatial data and reading it from the Tableau side of the database, we ended up with the following set of inputs:

  1. BI — Tableau.
  2. DWH — Clickhouse (reserve — PostgreSQL).
  3. The orchestrator — Apache Airflow.

Equipped with this arsenal, we set out on a campaign to demonstrate the advantages of Tableau over Grafana and win the hearts of users.

The competitive struggle

At this point, I identified three aspects to be given special emphasis during the presentation of the new tool.

  • Tableau is superior to Grafana in terms of functionality.

To prove this point, I collected requests for dashboards from potential clients and implemented them in Tableau. The outcome was positive, and the clients were happy with the new functionality and new features.

However, as I mentioned earlier, we did experience some difficulties. Users had grown familiar with Grafana and its features, and Tableau functionality seemed complicated to understand. This meant that we needed to teach them how to utilize Tableau’s functionality.

  • Tableau training.

I spent several days teaching Tableau’s basic functionality to users. The purpose of these demonstrations was not only to show users how to use dashboards, but also to demonstrate that Tableau was an easy-to-learn tool. Besides, you can build the same dashboards on your own (hello, self-service!).

While the first task produced an immediate positive effect, the second one required much more time and resources, so it was decided to put it off until a later date. However, the fruit it bore came in the form of a market research team, which independently embarked on designing and implementing dashboards as part of its line of work.

  • Documentation, documentation, and more documentation.

Everything should be documented. This includes data marts and dashboards right up to creating a video tutorial in order to understand how the metrics are calculated, how the data marts are put together and updated, what the processes are, and how they work.

All this has contributed to producing the intended effect and the number of users who began to use Tableau and order the development of dashboards increased. Gradually, the flow of new requests was transferred entirely over to Tableau.

Unfortunately, we still kept coming up against resistance. There were people who found it simpler and easier to use Grafana. Interestingly, the use of Grafana ended when users found 1–2 insights that were useful for their line of work (the proverbial aha-moment).

On the bright side, it was very easy to integrate users who had not dealt with Grafana before. The new product verticals, with reporting solutions based on Tableau from the get-go, launch for them straight away. As it was a new experience for them, they chose something that was more functional and convenient right from the beginning.

What now and where to next

Currently, all of our product verticals and teams whose work is related to data analysis have Tableau dashboards. To version our code, we use Git with the configured CI/CD variables, which allows us to conveniently deploy our scripts into production. To monitor our services, we use (surprise, surprise!) Grafana.

Gradually, we are moving our data marts to the new DWH — HDFS cluster, on top of which sits the Presto SQL-engine. As the teams grew, we started to feel overcrowded on one machine, so it’s a good thing that we now have a separate cluster. We still use Airflow for orchestration, but now we write code in PySpark. The dashboards in Grafana are still alive and kicking, but now we have enough resources to abandon this legacy and rebuild everything in Tableau.

We work closely with the team of data engineers, leveraging each other’s expertise, business needs, analytics, and calculation methodologies to build the data marts that best meet the current needs for data reporting and analysis. Being a part of the relevant team, BI engineers are distributed across product verticals. This was done to better understand what’s going on in a particular area of business, as well as to facilitate and develop a data-driven culture and share the expertise gained from colleagues and other teams.

We are actively developing a Tableau training program for our users. Our intention is to make sure that the transition from test data and training to building dashboards using actual data is as seamless as possible. Therefore, our training data is identical in structure to the actual data. The difference is in the values with which these tables are enriched.

Also, the training program will be divided into levels, since an analyst and a business developer need different functionality in terms of their complexity. Such a Tableau community will actively evolve, supported by expertise, events, and educational programs.

We are developing our own style guide for dashboards. It will also be based on training data with a structure that corresponds to that of actual data, making the process of transferring the data source to the production server as seamless as possible. This will help our users quickly build quality dashboards without wasting their time marking and distributing widgets and, if possible, rendering them from scratch.

When building and developing our solutions, we focus on ensuring that our dashboards deliver maximum value in the minimum amount of development time. Hopefully, we will continue to benefit our users by successfully developing our BI platform as a single access point for all reporting data, viewing needs, and developmental purposes. For me, it’s very cool when your product is used and trusted.

Finally, don’t forget that yes, we are actively growing and looking for people to join our team! I think in a few months’ time I author another article sharing more about what we have built. I’ll provide more details about the processes and technologies used, and the experience we’ve gained over time.

--

--