Building Data-Centric Organizations — Roadblocks & Ideas

Raj Samuel
Nerd For Tech
Published in
5 min readOct 29, 2021

Data-centric or data-driven means that the organization:

  • identifies data as a key asset
  • gives data due priority (ie, time and money)
  • and uses that data for its benefit

The benefit such organizations have is the ability to make informed decisions and answer business questions easily. I am parking the roadblocks and ideas to accomplishing this under two sections:

(1) Culture

(2) Technology, tools & practices

(1) Culture

Culture is a vague term but there is a great example from technology industry in our times — Microsoft’s cultural shift under Satya Nadella.

In a very few years, Nadella transformed a company that was no longer envied by Silicon Valley elites and made it enviable again. Under his leadership Microsoft:

  • embraced Linux & open source
  • started investing in developer friendly initiatives outside of their core revenue stream (eg: Microsoft is the largest investor in Python ecosystem, they released free and popular tools like Visual Studio Code etc.)
  • decentralized company’s decision making (eg: if a team has a great idea they can go ahead and build it, instead of having to prove it’s profitable for the Windows ecosystem)

This couldn’t have happened across Microsoft if a Director or VP set out to do it. It had to start from the top.

Culture is top-down. Always.

(a) Leadership

It’s easy to assume that a Director of Data with full ownership of data architecture can make the organization data-driven. But it only works if it comes from the top, from executive leadership.

I have painfully witnessed this tension in a few organizations — leadership of data platforms trying to inculcate a data-centric culture one project at a time, but the leadership of app development not budging, which happens because the senior executive they all report to doesn’t understand the value of it.

(b) Investment

When investment in data-centric evolution comes from a top-down cultural shift, it’s a great thing. When it’s not, it’s likely that middle management managed to get some investment, but with lack of top-down support it wouldn’t realize its full potential.

What type of investments? There is money spent on time to build products, money spent on governance, money spent on tools and expertise. Each project will take an extra bit of time when you’re data driven, but the benefits are in playing the long game. So sometimes it might be a trade off between speed-to-market and long-term growth.

In other words short-term investment vs long-term investment. Play the long game.

(c) Conway’s Law

Conway’s Law is an aphorism that says that an organization build systems that’s reflective of its communication hierarchy (and by extension, its organizational hierarchy).

I once worked in a startup where I was brought in to deal with data. The head of business development would communicate directly and only with the engineering manager about new developments. The application already gets built, either as a blueprint in someone’s head or as code, before I had the chance to do anything useful in data architecture. Then I play along with whatever gets built.

The top tech executive who oversaw all of this didn’t care much for data as long as the product goes to market quickly so there is not much I could do. This again points back to the top-down cultural problem.

In this case the business was just inadvertently doing (and with all good intentions) what they have been doing since startup came to be — build solutions and acquire customers as quickly as possible. But one day you have enough data to act responsibly with it, just not the culture to get there. This brings us to another key point:

Data-centric organizations are to be built deliberately, there are no happenstances

(2) Technology, tools and practices

If the culture is in place, that means there is leadership behind it and investments will follow. What tools and techniques do we invest in to build data-centric organizations?

(a) Common practices

The key is to understand what adds value to business and not to go overboard with investing in tools and techniques. When IBM pitches that data lineage tool free of cost, think twice. They will come back to collect license fee on the 3rd year. It’s common wisdom that many enterprise software products are simple Excel sheets becoming a product idea.

Invest in practices first; tools only if they prove to be necessary.

Some examples:

  • enabling an analytics practice to glean insights about business
  • master data practices — an MDM system may not be warranted but a properly modeled operational data store for your key entities (customer, product etc.) or a properly tracked data warehouse dimension is essential
  • creating a metadata platform (business glossary) instead of a growing list of documents and scattered wiki pages — doesn’t have to be a flashy governance tool, but something simple, perhaps built in-house, something better than an Excel sheet in C drive.
  • put data engineers/data architects/data directors at the same table as app/product team when you begin to build products and solutions, starting from the very first meeting, not after app teams have had a mental model of the application’s blueprint.
  • governance practices for auditing data, securing data, data quality, data stewardship etc.

(b) Tools

Whether or not to invest in a commercial tool for data governance is a cost-benefit analysis that depends on each business case. Consider master data as an example:

A retail clothing company I briefly worked for chose Stibo Product Information Management because the attributes of their product data — size, color, style, product hierarchy — assigned to products in different applications became inconsistent overtime. It became difficult for them to identify why, for example, a particular product had too many returns in last quarter.

They had the right data models and data solutions, but the operations became too hairy and unmanageable to sustain product data quality.

(c) Database technology

A major challenge for decision makers in data in the past decade has been the explosion of database technologies.

In the prior decades we operated from the premise that all data — operational data, master data, CRM, warehouses , SORs — are all going into relational databases. Then came special-purpose databases that did one thing really well and scaled out globally. We called them NoSQL. For a while we thought they were taking over, and it now seems that the ideas are consolidating.

There are in-memory databases, key-value stores, document stores, wide-column stores, columnar, graph, time series, search databases.. the list goes on. 2010–2020 was a confusing and yet exciting decade.

We still have those special purpose databases that are used for special use cases like search or caching. But some of these are also becoming features in relational databases — for example, native JSON support in Postgres and MySQL, vendors adding other NoSQL features as extensions, Oracle becoming a multi-model database and so on.

They key take away is that if a specialized database is used to solve a given problem like latency or search, that’s probably a good thing, but we still need the old-guard relational databases to keep an organization’s data tidy and ordered.

Summary

Data-centric organizations are a result of deliberate cultural shift from the top brass down to people on the ground. When culture is in place, other things follow: investment, tools and practices.

These are anecdotal learnings from working in different organizations, industries and countries in different capacities. If someone happens to read this and have other perspectives please throw in a comment.

--

--

Raj Samuel
Nerd For Tech

I write because I forget. (PS: if you take what I wrote and post it as your own please try not to edit it and post rubbish. CTRL+C, CTRL+V is your friend.)