Data Sourcing is The Ground Zero of Business Intelligence

Albhikautsar Dharma Kesuma
Life at Telkomsel
Published in
5 min readMar 24, 2021
Source: unsplash

We all understand the importance of Business Intelligence (BI) and how much it can help business operations. But to produce any useful output, BI needs data, and the data must come from somewhere. It could be from many data sources in the organization, such as transaction systems, customer management systems, or external data suppliers.

That is one of the trickiest jobs for us at Telkomsel Business Intelligence and Analytics Group, that is, deciding where we should obtain the data from; a process we refer to as data sourcing. We can say data sourcing is the first step of every process on building a solid BI platform. Data sourcing can be defined as a process of identifying data sources needed for BI processes, understanding the content, designing extraction and transport methods from multiple sources, as well as creating data preprocessing as required. In Telkomsel, this is performed by the Data Sourcing team, which is an integral part of BI & Analytics Group.

A recent report from Gartner shows that 87% of organizations. have low BI and Analytics Maturity1). There are many reasons behind this maturity, one of them is failure to fulfill data quality requirements. This is mostly caused by a poor assessment of data sources intended to be used. This highlights the need for Data Sourcing team to put proper data source analysis processes in place.

It is common for users to believe that the more data we gather from the source, the better. While it is true for some reasons, but this will be a costly approach if we don’t know what we actually looking for. As we may know, the data input may come from multiple sources, we need to have a preprocessing step to get correct information that serves our business goals. This will be a crucial step in achieving a successful initiative on Business Intelligence implementation. The Data Sourcing team is responsible to do the analysis and design of the data preprocessing. The work of Data Sourcing team is an interesting mix of data analysts and data engineers, which requires a mixed skillset in both areas.

There are numerous aspects that we need to ensure to implement a proper data sourcing process. But every time I worked on it, I always remembered one of the Habits of Highly Effective People: Begin with the end in mind.

​1. The outcome should drive what data is required

​It is true that the best way to source the data, we need to know what is the desired output from a business perspective. Having this mindset, we would understand what data is needed to source, how do we implement the data quality checks, and what rules and control to be implemented to manage the supply of data. This also prevents us from maintaining data that is underutilized in our BI platform.

​For example, Telkomsel builds machine learning models for personalized campaigns of its product packages, and they need lots of data sources of good quality in order to make the prediction accurate.

2. Profile the data with the data suppliers

​Strong collaboration with the data suppliers is needed to achieve the goal of the data sourcing process, clearly because they are the ones who understand the nitty-gritty of the raw data. We call this profiling the data. In this process, the Data Sourcing team will try to understand the data structure, granularity of the data elements, the frequency of occurrence, and the availability of the data. It is also important for the data suppliers to understand our intention with their data.

3. Get data as raw as possible

​It may be easier for us to have well-structured data that can be used as a report directly. But from a data sourcing point of view, it is better for us to have the data as raw as possible from the data supplier. Having data in its raw format will increase the flexibility in which we can utilize the data for many purposes rather than a specific business use case.

4. Implement an early check of Data Quality

​During data sourcing process, we may find that some data is more important than others, hence we need to put on a rank on each of data source we are going to manage. This will help us to determine the business impact if there is an issue with data quality.

​The cost of data issues will increase if we are not able to track those issues early on. To reduce data source issues, identify metrics, and implementing data quality control as early as possible will save us a lot of costs and time. Alerts on data quality issues based on the metrics created will help us to prevent these data quality issues into a bigger mess. This alert may trigger the appropriate team members.

​5. Prepare for change

​A new organization, new systems, and new business requirements are the things we cannot avoid; hence data sourcing process needs to be prepared for all kinds of changes. There are few aspects that we have to consider to face changes :

  • Data collection process should not be too rigid or too brittle; there might be a case, a data provider supplying data with a different format than what was agreed beforehand. A minor tweak should be able to rectify the process without impacting the overall BI process.
  • Maintain good communication channel to ensure the changes ahead of time
  • Apply a good change management system to handle the documentation and versioning of the data source.

Another task for the data sourcing team is to create an environment where the transfer of data from the source is always under control and is auditable. Every step of the process on data sourcing needs to be controlled to reduce the number of data quality issues that may impact the output of the BI platform. This controlled environment includes the versioning of changes of data from data suppliers. Versioning will help us to make further analysis of the impact that may occur on the existing environment.

TLDR;

Data sourcing is a critical element in a successful Business Intelligence implementation. In my opinion, a top-down approach by knowing the goal of each project will be the best way to execute the data sourcing process. But the success of this process still depends on all stakeholders; we need to have a clear requirement from the business users, a good collaboration with the data providers, a solid data engineering ingestion team, the correct technology to handle big data, as well as good documentation.

1) Source: Gartner Report https://www.gartner.com/en/newsroom/press-releases/2018-12-06-gartner-data-shows-87-percent-of-organizations-have-low-bi-and-analytics-maturity

--

--