Data governance journey at SEA’s largest SME digital financing platform (Part 1)
Background
As a licensed SME digital financing platform in South East Asia, Funding Societies | Modalku is subject to several regulatory and compliance requirements which factor into its data strategy.
Data is becoming an increasingly valuable asset and gives rise to significant competitive advantages in the FinTech world. Indeed, without high quality data and upward reporting of meaningful management information, financial institutions cannot identify and monitor risks. Nor can they properly understand the performance of various business functions.
Organisations with a solid understanding of data governance are better equipped with decision making; uniform data across the organisation; increased data literacy; and improved regulatory compliance.
Data governance definition
The Data Governance Institute defines data governance as “a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”
The Data Management Association (DAMA) International defines data governance as the “planning, oversight, and control over management of data and the use of data and data-related sources.”
Data governance keeps your data organised, accessible, and compliant. It is the overall management of data in an enterprise to monitor the availability, usability, integrity, and security of the data.
Daily operational and regulatory challenges
The following questions may seem familiar when looking at the daily operations of the data team and its stakeholders:
- Where can I find this data?
- What is this data for? What are the reports/metrics being generated using it?
- When was this data last refreshed?
- Who is the owner of this KPI or data ?
- How to get access to this data?
- Is the data trustworthy?
- What would the impact be if I make changes to some data KPI?
- Is it okay to share this data?
All the above questions can be broadly categorised as
1. Know your data: Understand and classify your data.
2. Control your data: Apply flexible governance and security policies that don’t hinder innovation.
3. Integration across different applications: Simplify governance across the data landscape within one platform.
Our approach
We follow the DAMA — Data Governance Framework where each topic is a knowledge area having its own specialisation. I will not be covering the details of each knowledge area in this article.
Data Management and Governance is not a one time project. It can be seen as a journey that any organisation has to go through to become data-centric. Below is the high level diagram of our journey, which is still in progress.
Get buy-in from leadership
Data governance is a collaborative effort across the organisation and it requires business buy-in and ownership. We followed an operating model that requires participation from Infosec; Compliance; Legal; Business; Engineering; and the Data and Analytics Team. Here we defined all the policies, framework, roles, and responsibilities across the organisation.
Defining the policies to start the journey
Data Governance Policy: This guides the development of best practices for effective data management and protection. It also protects the data assets of the organisation against internal and external threats. Key data governance responsibilities outline the access rights, roles, and responsibilities of personnel in relation to the management and protection of data. We defined these roles: Data Domain Lead; Data Protection Officer; Data Custodians; Data Stewards; and Data Users.
Data Classification Policy: This defines the high-level objectives and implementation instructions for the organisation’s data classification scheme. This includes data classification levels, as well as including guidance for procedures around the classification, labelling, and handling of data within the organisation. Confidentiality and non-disclosure agreements maintained by the organisation must reference this policy. All the data must fall under the categories such as “Confidential”, “Internal”, and “Public”.
Data Loss Prevention Policy: This policy enumerates measures to detect and prevent unauthorised access, modification, copying, or transmission of its confidential data.
Access Control Policy: This policy defines and establishes the access controls required for information systems (or “applications”) to safeguard information from unauthorised users/entities with malicious/bad intentions.
Create an Information Asset Inventory
As per our Risk Management Framework, we started by identifying the Critical System, Processes, Data, and Assets. Our risk management process is carried out in the following steps:
The Data team along with Engineering, Security, and Compliance has created an Information Asset Inventory of structured and unstructured data for the data risk identification and assessment.
Each information asset will have a defined:
- Function/Category
- Owner (Tech/Business)
- Set of data assets which reside in the information system
- The information classification will provide guidance to further determine the CIA (Confidentiality, Integrity, Availability) rating over the information assets
We planned to define Business Terms (Glossary) for each Information Asset and map the corresponding CIA Ratings to Data Assets ( Table/Columns ) and the Information Systems.
Below is the flow on how CIA ratings propagate from Information Assets to Information Systems which we will apply using the DMP (Data Management Platform) tool.
Setup domain driven architecture
We started assessing the existing data warehouse (Snowflake) architecture and restructured it to domain(data) driven layered architecture. This framework helped us categorise the datasets and figure out how the data will be stored, processed, and managed to establish best practices for data management, governance, and security.
Data from sources (Postgres/other sources) was pushed to Amazon S3 as a raw store and then brought into Snowflake data warehouse, leveraging Apache Airflow as the pipeline/orchestration layer.
Data domains in Snowflake database as schema
Below is a sample of how we defined data domains/subdomain in the Snowflake database as schema. For example, the “Transaction Subject Area” that comprises everything related to it such as loans, invoices, and investments. And, the “Counterparty Subject Area” that comprises sub-domains like investor, borrower, and partner.
We plan to define data classifications in the DMP tool based on Data Domains and map all data assets to it.
These data domains are in-line with the Information Asset Groups.
It is easier to discover data and assign Data Domain Owners/Stewards and curate business glossary.
We can customise the access policies by creating user groups/roles based on these classification.
Below is the representation of how data domains as classifications are mapped to data assets (Table/Columns). Table classifications are directly propagated to its columns and forward lineage is used for one table to another table’s column classification mapping.
Building a business glossary
We started the process of curating the business glossary after we have defined the data domains and subdomains with the respective owners.
This flow visualises how business glossary terms are used to link the information assets to information systems by mapping these terms to the respective data assets.
Each data domain owner has initial responsibility to curate the business glossary term for their respective data domain group. Below is a basic sample:
This list can also have other useful informations like abbreviations, synonyms, and other classifications like PII, CSI etc.
We started off with the basic list and tried to leverage the DMP tool functionality to curate our business glossary list and map it to data assets.
Data management and governance goals
We set our initial goals taking reference from the DAMA framework to start this journey.
- Setup the data catalog and business glossary.
- End-to-end data lineage and impact analysis.
- Ensure all data assets are tagged and classified (PII, Sensitive, CIA etc.).
- Data security, data access, and data control.
- Data quality profiling.
Check out Part #2 to learn more about our implementation approach and how we made strides towards reaching our data management and governance goals.