How to build a Data Architecture to drive innovation — sample

Nason Agung
nasoncorp
Published in
5 min readApr 14, 2022

Over the past years, companies have had deploy new data technologies, however, these technical additions have greatly increased the complexity of data architectures, limiting an company’s ability to deliver new capabilities while maintain existing infrastructures.

Cloud provider such as Amazon and Google have launched state-of-the-art technologies that enable user to get greater agility and faster time to market of their analytic.

Analytics users want more integrated tools, such as automated model deployment platforms, so they can employ new models more quickly.
Many companies are using application programming interfaces (APIs) to expose data from various systems to their data lakes and incorporate insights into front-end applications quickly.

Companies must have a clear strategic plan, technology leaders must make brave decisions to prioritize those changes that will have the most impact on business goals and to invest in the appropriate level of architecture sophistication.

There are 6 foundational shifts the modern companies are making to their Data Architecture blueprint:
1. From on-premise to cloud-based data platform
Cloud providers have shaped the way of Companies on deploying, running data infrastructure, platform and application at scale. They enable company to deploy new self-service capabilities in days rather than months.

The enabler concept and component:
- Serverless data platform such as Amazon S3, Google BigQuery companies to create and manage data-centric applications on an endless scale without the complexity of installing and configuring systems or managing workloads,
- Containerized data solution using Kubernetes enables companies to decouple and automate deployment of addition compute power.

2. From batch processing to real-time processing
Real-time data communications and streaming capabilities have become much less expensive, laying the groundwork for widespread adoption.

Real-time streaming features, such as a subscription mechanism, enable data consumers, such as data marts and data-driven employees, to subscribe to “topics” in order to receive a continuous feed of the transactions they require.

The enabler concept and component:
- Messaging platform such as Apache Kafka provides fully scalable, persistent, and fault-tolerant publish/subscribe services that can process and store millions of messages per second for consumption immediately or later.
- Streaming processing such as Apache Storm and Apache Spark Streaming allow for direct analysis of messages in real time.
- Alerting platform such as Graphite or Splunk can warn sales representatives if they aren’t hitting their daily sales objective, for example

3. From pre-integrated solution to best-of-breed modular platform
To extend applications, companies must often go beyond the limitations of legacy data ecosystems provided by large solution vendors.

Many companies adopting a highly modular data architecture that incorporates best-of-breed and, in many cases, open-source components that can be replaced with new technologies as needed without compromising other aspects of the data architecture.

The enabler concept and component:
- Data pipeline and API-based interfaces simplify integration of diverse tools and platforms by hiding data teams from the complexities, shortening time to market, and lowering the risk of introducing new issues into current applications.
- Analytic workbenches such as Amazon Sagemaker and Kubeflow simplify building end-to-end solutions in a highly modular architecture.

4. From point-to-point to decoupled data access
One company setting up internal “data marketplace” via API to simplify and standardize access, they are migrating existing data feeds to an API-based structure and API management platform.

The enabler concept and component:
- API management platform manage and publish APIs, implement usage policies, control access, measure usage and performance, allow user search existing data interface and reuse them.
- Buffer transaction platform outside core systems example a bank built columnar DB to provide customer information directly to online and mobile banking apps and reduce workload on mainframe

5. From an enterprise warehouse to domain-based architecture
From central enterprise data warehouse toward business domain of product owner (marketing, sales, finance, etc.), This approach demands a fine balancing act to avoid becoming fragmented and inefficient, but it can reduce the time spent up front on building new data models into the lake, often from months to just days, and can be a simpler and more effective choice.

The enabler concept and component:
- Data infrastructure as a platform provides common tools and capabilities for storage and management
- Data virtualization techniques which is used to organize access and integrate distributed data assets
- Data cataloging provides enterprise search and exploration of data, metadata and data access

6. From rigid data model to flexible, extensible data schemas
Companies are moving to “schema-light” methods, which use de-normalized data models with fewer physical tables for optimal efficiency, to gain greater flexibility.
This approach has a number of advantages, including faster data exploration, more flexibility in storing structured and unstructured data, and less complexity because data owner no longer need to add additional abstraction layers such as multiple “joins” between tables.

The enabler concept and component:
- Data vault techniques ensures data models are extensible, allowing for the addition or removal in the future with minimal disruption.
- Graph databases offer the ability to model relationship within data, many companies are building master data repositories using it to handle evolving information models.
- Technology services such as Azure Synapse allow querying file-based data like relational database. This gives customers the option of continuing to utilize familiar interfaces like SQL while accessing data stored in files.
- Using JSON to store information enabling companies to change database structures without changing business information models.

How to start
Data and technology executives will benefit most from adopting methods that allow them to swiftly analyze and deploy new technologies in order to quickly adapt. Four practices are crucial here:
Apply a test-and-learn mindset leaders can start with lesser resources and develop minimal viable products, then release it into production (using cloud to accelerate) to demonstrate its worth before growing and improving.
Establish data “tribes” where squads of data stewards, data engineers, and data modelers work together with end-to-end accountability for building the data services
Invest in DataOps (enhanced DevOps for data) which can speed-up the design, development, and deployment of new components into the data architecture.
Create a data culture where employees are eager to use and apply new data services within their roles.

--

--