Qlik Sense under control

Benjamin Berriot
Dec 18, 2020 · 6 min read

At Maisons du Monde, we are using Qlik Sense as our data vizualisation tool. We provide dashbaords for all the company. We also send hundreds of static reports every day, generated by Qlik NPrinting. This make Qlik one of our main tools by being the link between the data team and the business teams in the company. Hence it’s very important to keep Qlik under control.

Qlik infrastructure

Google Data Center (source)

At Maisons du Monde, we have the following infrastruture for our Qlik environments:

  • 3 VM for Qlik Sense, oneVM is for development environment, onefor production environment.The last one, called master, is in charge of dashboard reloading, QVD (Qlik file format) creation and other Qlik administration under the hood.
  • 1 VM for Qlik NPrinting. Qlik NPrinting is an application provided by Qlik, that can create static reporting and send report by email for example.

All of these virtual machines are running Windows Server and are host on Compute Engine in Google Cloud Platform.

Data pipelines V1: time based orchestration

source: https://kinsta.com/fr/base-de-connaissances/tache-cron-wordpress/

As we already explained in this article, at the begining we were using Rundeck as the orchestrator of our data platform. Rundeck is a time based orchestrator. Qlik provide the possibility to schedule application reloading based on time. Our pipeline between our data platform (orchestrated by Rundeck) and Qlik (orchestrated by itself) was disconnected. This means that if a Rundeck tasks was failing or running wrongly, Qlik would still reload the dashboards with no up-to-date data or worse, wrong data. At Maisons du Monde, our pipelines are mainly excecuted on a daily basis, this is why having the data late or incorrect in the dashboards is a big issue for our business teams.

It’s even worse with complex pipelines (source: author)

Moreover, using time based scheduling is not optimized in terms of landing times. Optimising these pipelines can be complicated. For users, it’s important to have their dashboards up-to-date in the morning, to take better and faster decision based on data.

Data pipelines V1.5: hybrid orchestration

In mid 2019, we switched from Rundeck to Airflow, but we focused our move on the data pipelines running inside Rundeck. During this period we kept dashbaord reloading scheduled in Qlik, still based on time. We were not experiencing any issues with this, because most of our pipelines running on Apache Airflow were ending earlier than with Rundeck. In fact most of our dashboard were reloaded every day without any issues or data out-dated.

However, this orchestration didn’t solve two main issues:

  • Landing time: we had pipelines finishing early on Airflow but dashboards were reload later in the day (remember the security margin for Rundeck).
  • Observability: using two orchestration, we had to monitore two different pipelines to check if something went wrong.

Data pipelines V2: Direct Acyclic Graph (DAG)

Example of DAG (source: author)

In late 2019, we planned to trigger Qlik application reloading directly from Airflow. Luckily Qlik Sense provides an API to trigger various Qlik operations, such as reloading a dashboard. You can find the documentation to create Qlik Sense credentials here and a getting started here.

To trigger a task in Qlik Sense with the API we are using the following route: post /task/{id}/start/synchronous (doc). This route returns a session_id and you can check the task status with the route: GET executionresult/ful?filter=ExecutionId eq {session_id} (doc). All these requests are send to http://<your_server_ip>:4242/qrs .

Bonus, this is the list of task result id and the corresponding status:

all_status = {0: "NeverStarted",1: "Triggered",2: "Started",3: "Queued",4: "AbortInitiated",5: "Aborting",6: "Aborted",7: "FinishedSuccess",8: "FinishedFail",9: "Skipped",10: "Retry",11: "Error",12: "Reset",}

As we discussed in our article about Apache Airflow, we are using KubernetesPodOperator as operator. The stack in the data team is Python, so we started to develop a Python code that interacts with the Qlik Sense API. Nothing really complicated right here, we are using requests to make API calls.

At the begining of 2020, we extended the code to start Qlik NPrinting tasks. You can find the getting started tutorial here. Authentification with this API is a little bit tricky, because this API uses NTLM Auth. To achieve the authentification you need to create a user in Qlik NPrinting.

Our Python code is still using Requests, but we added requests-ntml for authentification. First, login with this route GET /login/ntml (doc). To start a NPrinting task we are using this route POST /tasks/{task_id}/executions (doc), you can extract the execution id in the response under data/id. Then to check the task status, we are usingGET /tasks/{task_id}/executions/{execution_id} (doc).

With Qlik Sense and Qlik Nprinting triggered by Airflow, we improved landing time, data dependencies and observability. As an example, in our pipeline V1, we had a dashboard which was reloading at 3pm. After the full switch to Airflow, this same dashboard is reloaded at 11am just by changing the orchestration. That is a huge improvement for user experience.

Data lineage

One of the big issue we are experiencing with Qlik Sense, is data lineage. To add data into Qlik Sense dashboard, we need to write a loading script, in which we define connection to the datasource, load data and make some transformation. For example this is a basic script that loads data from a database into Qlik Sense, with SQL:

LIB CONNECT TO 'DataSource';LOAD *;SQL SELECT * FROM `Sales_data`;

This scripts is written by the developer of the dashbaord, directly inside the Qlik Sense application. When the dashboard is published, the script can’t be edited or viewed. You need to duplicate the app in your development environment to edit/view and republish the app. One of the problem with this process is when you want to deprecate or update a datasource, and you want to check all dashboards using this datasource.

In october 2020, I found this repository on Github (https://github.com/ptarmiganlabs/butler-spyglass) where someone developped a code in Javascript to get the Qlik loading script of an app through an API. I search in Qlik documentation, and finally found it, the Qlik Engine JSON API . With this websocket you can dump the loading script of the application into a file. We implemented a Python version of this code (inspired by this code: https://gist.github.com/clintcarr/d0a8bb6e15cb6c9a3472d424c344ce73).

To make the results open for everybody in our team, we are uploading all loading scripts generated by our code to Gitlab every night. We can now find easily which data source is used by which dashboards, and have versioning of every scripts. As a git repository you can clone it on your computer and with this VSCode extension (https://marketplace.visualstudio.com/items?itemName=Gimly81.qlik) you can now open all scripts files, have syntax colors (save files as .qvs) and perform a “search in all files: XXX”.

Performance monitoring

Our Qlik infrastructure is running on 4 VM as described above, so we have a performance monitoring of each machine. We are monitoring basic informations such as CPU, RAM, I/O with a Grafana dashboard. This monitoring is really useful to know if all VM’s are ok. One monday morning we had Qlik down/really slow, when we saw on the monitoring that one of our machine had no more ram we understood the problem. We simply rebooted all machines and the problem was gone. We investigated the root cause and found that every monday morning the ram usage was really high. So we looked for a specific dashboard loading every monday morning, we found it, optimised it and this problem never occurred again.

Conclusion

We have developed some tools to improve data pipelines and data lineage with Qlik Sense. There is still some issues to address such as alerting when dashboard reload will be delay, data quality and dashboard versioning. With automated extraction of Qlik loading scripts, Airflow and Google Cloud Logs we are close to perform a full data lineage from Google Storage to Qlik Sense.

We are looking forward for your comments, hoping that this feedback was useful and feel free to reach us for further discussions and experience sharing.

Maisons du Monde

A collection of articles published by Maisons du monde.