Design thinking toward Data-Driven Organisation

Nitin Khaitan
Towards Polyglot Architecture
4 min readAug 20, 2022

--

Every day, we roughly generate 2.5 quintillion bytes of data worldwide. The path moving from just an organisation that generates a lot of data to an organisation that creates and relies upon a centralised data lake and can maintain data sanctity satisfying current as well as future vision is not that easy. It requires a lot of design thinking, architectural building blocks, design principles and good practices to be in place. By — Nitin Khaitan
The step toward a data-driven organisation

We generate roughly 2.5 quintillion bytes of data worldwide every day. In this data-centric environment, every organisation wants to be data-driven and make data-centric decisions. Based on the growing need to store different types of data, we now have around 11 types of databases and more than 100 databases to handle this varying need.

An organisation uses or builds one or more products to manage different work and target audiences. A product has multiple features, and based on the feature’s/business’s use case, data structure, and modelling requirement, it might use one or more databases. Microservice architecture helps us achieve this to a great extent as well. However, with the growing diversity around multiple databases (one or more types) being used by an organisation comes the high responsibility of managing and keeping them unified under one umbrella.

Moving from just an organisation that generates a lot of data to one that creates and relies upon a centralised data lake and can maintain data sanctity, satisfying current and future vision, is difficult. It requires a lot of design thinking, architectural building blocks, design principles and good practices to be in place. A few of them are listed below for reference:

Key data design principles

Shared asset

Data belongs to the organisation, not the department or vertical that generates or consumes that data. It should follow organisational semantics and not be centred at the department level.

Data sharing interfaces

Data creation and sharing should be exposed via standardised interfaces or APIs. Direct database interaction should be prohibited, as it is not per security compliance.

Direct database access should be prohibited and limited to specific people or groups.

Who can access what

Not all data interfaces should be exposed to everyone. It should be controlled from the perspective of the role and group to which a user belongs.

Default access rights should be denied, and it should open up specifically for specific user groups.

Naming convention and vocabulary

Naming conventions and best practices for data modelling should be defined at an organisational level, and every department/product/feature doing data modelling should follow that. A shared vocabulary should be used. For example, one department says customer and another says consumer needs to be corrected. We should follow a common language to ensure data readability and consistency at the org level.

Data curation

With the growing need and changes in business dynamics, organisations keep developing new features or making changes in existing features, and data definitions for certain entities may change while we are on this journey. It's equally essential to iteratively examine the data model and remodel or clean up as required. This will help keep the domain data model intact and aligned with business needs.

Data Store

Although we can have multiple database types, engines, and instances for different needs and can vary according to microservice requirements, they should all be governed by organisational guidelines and conventions. The data architecture should be designed to avoid data duplication and ensure we have only one data store as the source of truth for a particular piece of data. The same store should be referred to in data requirements instead of copying and using the data locally.

To achieve performance, we can copy data to a cache and keep it copied in the local database or even use a specific database with copied data to serve a particular business use case. However, this comes with an extra overhead to maintain that data and ensure it never stalled.

Good practices around data modelling/architecture

BDAT in TOGAF

BDAT stands for Business, Data, Application, and Technology in TOGAF Architecture. When solving any business problem, we should think from the business/domain perspective and then from the data perspective. Changing the data model at any later stage is too costly and carries high risks.

Define dataset and Data Entity

Designing the dataset and its respective data entity helps us in deciding the following:

  • the data structure for the interfaces
  • What database to use? i.e., A document type database like MongoDB might be a good use case for catalogue as compared to RDMS databases like MySql

Data dictionary

A data dictionary should be created to keep metadata around the data model. This helps us define the detailing around the table/document and columns. It keeps the purpose and definition of the table/document/fields from a business and layman's perspective.

Data validation

Mostly, business validation is done at the service/business layer, and data validation is done at the controller layer in an application. But from the data sanctity perspective, key constraints like (foreign key, primary key and unique key) should also be applied at the database layer. Additionally, conditions like string length, enum and data type are used at the column level to ensure good data quality.

Who created/updated what, and when

Direct database changes via a database client in production should never be done. It should always be done via the business/service and repository layer. The repository layer should have auto-aware hooks that should automatically apply to the table/document:

  • created_on
  • created_by
  • updated_on
  • updated_by

ER diagram

This gives a visual and intuitive view of the different tables/documents in a database/collection and helps us understand how those tables/documents are related to each other. Different database clients provide various tools to reverse engineer the ER diagram from the underlying database.

Conclusion

Data design principles and their related good practices around data modelling are the keys to becoming a data-driven organisation in the true sense. It empowers the software architecture in the true meaning to achieve what it is being developed for.

--

--

Nitin Khaitan
Towards Polyglot Architecture

Strategic thinker, a technically astute developer/architect with 15+ years of experience owning engineering, backend, and frontend from infancy to success