Stay Away from the Data Mess!

Data…What? Series Part I

Published in

Teradata

7 min readJan 6, 2023

Let’s face it. We IT people love buzzwords. Sometimes we think the quality of a solution improves with the number of flashy words or acronyms used to define it. And we can’t be more wrong! Each day we must learn a new word or acronym to stay updated (luckily we have ChatGPT now to ask for answers!). However, are things really changing that much at their core? Or is the old mantra “the more things change, the more they stay the same” still ruling our (data) world? Let’s dig a little deeper into that.

Setting the background for Data Architecture

In the ‘80s, not only I was born, but also we saw a new paradigm emerging — the Data Warehouse architecture — built to consolidate data from different systems to create a reporting infrastructure to serve business users, and at the same time, save some dollars by freeing processing from expensive mainframes and transactional platforms. As time evolved and business requirements got more complex, we’ve also seen the Data Warehouse evolve to a more mature and robust architecture to serve more and more use cases, from Batch Reporting to Complex Near-Real-Time event processing. We’ve been advocating this at Teradata for decades, as it is depicted in the image below.

The ‘90s and ‘00s increased focus on the speed and the complexity of the data that were already used for analytics and enhanced those with more non-traditional sources (like web application logs or CRM outputs). Operational Intelligence (now usually found as Operational/Streaming Analytics, Reverse ETL, and CDC/Event Tracking) was a key theme that made Data Warehouses more “active,” with intra-day or even intra-hour data loading along with the standard daily batches of data. There was also a switch from “looking at the mirror” to “looking at the front,” with predictive analytics as a key enabler of business decisions.

Like Teradata’s CTO used to say, “it makes no sense driving a car by only looking at the rear mirror.”

We experienced a prosperous era with Data Warehouses delivering value to business users, with relational databases like Teradata dominating this market and bringing lots of business value and customer satisfaction to the world. On the dark side, we have also seen failed implementations, unable to deliver much value, because those implementations were too “IT-centric” with little connection to the business needs.

Although some may view this as “old school” or legacy, we need to know where we came from to plan where we are going. Many concepts shown here are still currently under another naming. The 2010s brought the Cloud Datawarehouse to the scene and everything said up to here is still valid, except that these platforms tend to be more focused on ease of use and “business user friendly” but I feel like I’m jumping in time, so let’s keep this on the parking lot for now.

The lesson learned from this era was that architecture deployments must be aligned to and focused on meeting business needs. Spoiler alert: does this sound familiar to you today? We’ll get back to this later.

The “Data Something” architecture issue

Over the years we’ve seen a huge explosion in data volume from numerous data sources, some with a huge volume of records produced at high speed. This variety and amount of information introduced new requirements to be able to capture and curate it. New ways of consuming the information extended the need for real-time, even along with other challenges such as higher concurrency and increased data structure complexity. Congratulations! We just entered the Big Data Era! Buckle up, this is going to get bumpy.

The architectural principles that we’ve seen for decades started to shift to a different paradigm. We realized that having a one-for-all or multi-purpose platform wasn’t enough and suddenly we saw the birth of the hybrid analytical ecosystem — the “Logical Data Warehouse” or the “Enterprise Data Hub” among many other names (depending on which analyst firm’s lexicon you prefer). Going one step further, we met the new kid on the block: the distributed file system, powered especially by the Hadoop Project. This became the great enabler of the bright and new architecture pattern: The Data Lake.

Figure 2. Reference Information Architecture

Data Lakes became the “de facto” standard for many analytical ecosystems up to the point that if you weren’t building a Data Lake your analytical ecosystem was not cool, and you weren’t perceived as modern. We’ve seen the great and mighty Silicon Valley players building their Data Lakes and providing new tools and knowledge to the framework. But guess what? Most implementations failed (if we built it, they will come, but they didn’t) although it was not because of the design pattern itself. Most Data Lakes were built without any clear business purpose and proper data management practices. It was “cheap” and “good enough” based on open-source software on top of commodity hardware. No matter what companies thought, even open source and commodity have a cost and take a huge amount of effort to deliver real business value. We may agree that some social network and content streaming companies, among others, have succeeded on this trip, but the vast majority of companies have a very different way of doing business and building teams compared to the Valley companies. So, no recipe is good for everyone.

Figure 3. Data Integration on an Analytical Ecosystem

Fast forward to the present, from the swampy mud of the Data Lakes, we see that Data Warehouses still playing their part. As mentioned before, now Cloud implementations helped the Data Warehouse to become (again) a relevant and important solution (remember when they said it was going to die?) by adding more flexible capabilities, better scalability choices, and especially a focus toward self-service deployments (in the end that’s what cloud is about). Vendors like Teradata have managed to improve the robust capabilities of their on-premise solutions to add the “modern” capabilities that the cloud brings in solutions like VantageCloud.

Cloud also changed the way Data Lakes are seen and implemented, by leveraging object storage. With the enablement of cheap, scalable, and flexible storage, now is easier than ever to collect information and consume it on demand with different compute options, ranging from simple reporting capabilities to very complex deep learning algorithms.

But wait… to make things more interesting (and “modern”), we also see new frameworks rising too. We started to hear about the Data Lakehouse, the Data Fabric, and the Data Mesh, among a few others. And yes, this is going to be part of the next “Data What? Series” story.

Data is the air in the wheel

We are living in a highly competitive era, with a marketplace full of great data solutions (and not-so-great ones), but let’s focus on what is important: the data. Sadly, many solution providers try to differentiate themselves by adding new spokes to the data wheel, trying to shake the “status quo” without much real content. Is it bad to change? Certainly not. We advocate and embrace change to enable innovation. But remember that data is what powers the wheel of progress. We can improve the efficiency and modernize the design of the wheel to make it more secure and adapt it to different environments or weather…but in the end, it’s still a wheel. Well-architected data is what inflates the wheel to enable your business to drive to new heights safely with the fewest bumps in the road.

So… Data what? Data strategy!

That’s the word you should be looking for (well, in fact, those are two worlds). No matter what is making loud noise outside, you need to keep looking at your business requirements and build the data strategy based on that. What data will bring more value? Which sources are already there and could complement this new initiative? How can we prioritize them? Which funding will be available to accomplish this? Can we guarantee a good ROI to get more funding? If we build this first, can we start to get value faster? After we answer those questions, we can apply architecture patterns to make the most intelligent use of the company’s technological and monetary resources. Based on the value of the data and the characteristics of the required analytics, we can find a good fit for different architecture patterns. We may be able to reuse what’s already proved useful. There’s no need to stick to one pattern but embrace what is useful of each one.

It’s not the other way around. Trying to fit the data, analytics, and even the business decisions into an architecture pattern just because it sounds cool is not good. Holding the company’s architecture strategy and decisions to “just a word” is quite dangerous.

Remember, in the end, a proper data strategy will deliver more business value than chasing a shiny new architecture buzzword. Oh, by the way, how many buzzwords have you spotted in this article?

Further Information

Don’t forget to go to teradata.com to find more information about Teradata products and follow us on Medium to get notifications on news and important developer content.

Stay tuned for the next “Data… What? Series” story! And remember…. Always have fun, and happy coding!!