Modern Data Architecture is Here to Stay — But How Can You Get There?
You hear the words “Modern Data Architecture” (or MDA for us data nerds), and what do you think about? Many people think that it’s just moving your data to the cloud, and Infrastructure as a Service (IaaS) companies like AWS and Microsoft love that messaging. Utilizing the cloud to build out your MDA is definitely a plus, as IaaS platforms provide the tools and capabilities needed. In reality, when thinking about the concept of a modern data architecture, there are a lot of options, both on-premise and in the cloud. Each option can help an organization get the most out of their data, and provide a strong data foundation for their future development. The keys to a successful Modern Data Architecture are flexibility, scalability, and a decoupled infrastructure. Let’s break down what this means to an organization.
Flexibility: Flexibility is maybe the loosest term here, because it applies to so many different aspects of data architecture. Ultimately, the goal of flexibility is to be able to plug and play any tool into any section of an architecture and have it work seamlessly. If a data architecture is too reliant on one tool or another, it becomes very difficult to pivot to something that is more tailored to your needs. The main point of a modern data architecture is innovation, and the ability to maintain an enterprise level of innovation.
In practice, this isn’t always possible, or even practical. Many organizations have too much at stake in one particular data platform or product, or have way too much data on premise to move it up into a cloud data lake. Some vendors will even add lock-in clauses to the contract, keeping the organization’s investment for a longer period of time. The goal of a data architect is then to design a platform around that particular tool, all while introducing new processes, platforms, and capabilities to help and support their MDA initiative.
Scalability: One of the biggest complaints analysts and IT professionals get is that they can’t add a new data set or technology because they don’t have enough server space. Then, the debate between getting rid of something that is already on the servers, not adding the new capability, or buying a new server to enable the new capability. This process can take weeks, if not months, all to the detriment of the enterprise. When designing a modern data architecture, scalability must be at top of mind. With the main driver of an MDA being innovation, the enterprise does not have time to wait and see if they can continue to innovate and develop their architecture. Speed to value is the name of the game.
This is the main differentiator of a Cloud IaaS platform from an on-premise architecture. Cloud platforms can horizontally scale up and down on request, adding disk space when needed and removing unnecessary disk space to minimize cost. There are no physical servers that need to be bought. Theoretically, an enterprise can scale their architecture infinitely (though this is obviously not cost-effective) with little to no effort at all. Any on-premise architecture will have large drawbacks in terms of scalability, mostly due to how quickly they can scale. However, that does not mean that you cannot design a hybrid MDA, with normal transactional data warehouses staying on premise, while analytical and other warehouses are decoupled and moved into the cloud.
And speaking of decoupled…
Decoupled: This is the most difficult, and most important, piece of a Modern Data Architecture to architect, let alone implement. The idea of a decoupled architecture fits both into the flexibility and scalability of the architecture, and is the true driver of innovation. There are many ways to decouple a data architecture, but ultimately the idea is that no part of your data architecture should rely on any other part to function for the enterprise. For an application, decoupling your architecture is much easier, as messages from one piece of the application can be queued before going to another. Data pipelines are not as easy to queue up, so when one database goes down, that’s an issue.
The main goal of a decoupled architecture is the ability to plug any tool in for any other tool within your architecture. For example, if you have a strong Modern Data Architecture set up, and your analytical data warehouse is currently built in Amazon Redshift, you should be able to decide that you prefer Snowflake, and replace your Redshift instance with Snowflake, with minimal rework on the rest of your architecture. This is just one example, but if you find that replacing a piece of your data pipeline has a huge impact on the rest of your architecture, you may want to revist your design to see if there is a way to further decouple the architecture, and minimize impact.
The final question is, how can you build a modern data architecture? Does it have to be in the cloud? And the answer is no, but the tool set for an MDA in the cloud is a lot more robust. However, with a move to the cloud comes a completely different set of risks that must be taken into account. For example, when you build out your data architecture in an on-premise data center, you may secure it using firewalls and other on-premise security tools. In the cloud, you have a completely different way of building out your security. For example, in AWS, you have what are called VPCs (Virtual Private Clouds) that consist of a logical data center and the rules that govern it. VPCs can be locked down just as securely as an on-premise data center, but you need to be trained on the ins and outs. Building a modern data architecture can absolutely be done on-premise, but just keep in mind the key principles when designing your architecture: flexibility, scalability, and decoupled.