Teradata and Hadoop: The Biter Bitten

Robin Bloor
Bloor Group
Published in
4 min readOct 28, 2019
Never, to be honest.

I just spent a few days in Denver, attending Teradata Universe 2019, which is winding down as I write these words. There was more meat on that bone than I expected, but what piqued my interest most was Teradata's Hadoop Migration service (to Teradata Vantage). As the cliché goes, dog bites man is not news, but man bites dog, that’s page-one material.

A Retrospective.

Surely you remember the “hype years” when Hadoop was but a baby elephant, promising to trample the big data market beneath its clumsy feet as it grew. Hadoop was the revenge of the nerds. The technology would surely scale to the sky. An open-source army of guerilla coders would consign all the expensive data warehouse technology to the dustbin of history. It was time for the dinosaurs to die.

There were brand new mammals in town with names like Cloudera, Hortonworks and MapR and they had brave new elephant-sized business models in tow. The software came for free; they earned their gold from support and consultancy. IT Departments could hardly believe how inexpensive it promised to be. Oh, if only Hadoop and all its fellow-travelers had known how to keep their promises.

The Land of Lakes

Who would have thought that an elephantine disk-based system, initially only able to run batch jobs one by one could gain the mind-share it did. The last time a system with such characteristics charmed the world was back in 1964 when IBM launched its revolutionary 360. That was over four decades ago.

What could you realistically use such a system for?

It wasn’t totally foolish, of course. If you have mountains of data and a cheap array of servers that can march in parallel then you can crunch data. Hadoop could do that. Where it scored (and it didn’t score often) was in three areas: As a near-line archive for aging database data, as a sandbox for data scientists and as a data lake.

Let’s focus on the data lake as that is/was the major use-case. I’ve always thought of “data lake” as poor terminology, but there is a definite need for a landing area if you’re gathering data to feed to an analytical warehouse and/or a portfolio of analytics apps, and that’s what a well-constructed data lake can be. A primary virtue of Hadoop is that it can accept any kind of data. So in a Hadoop data lake you can do data transformation, data categorization, data cleansing, apply encryption and so on. In other words, it can be an analytics data preparation area. Prior to Hadoop there was no such thing.

Hadoop Emigration Teradata Style

However, Hadoop has proved really inadequate as an analytics engine, and that’s a shame because many companies tried to travel that route and got nowhere slowly. In fact, enough of them did that for Teradata to view Hadoop systems rescue as a promising market to attack.

Put simply, Teradata is productizing Hadoop Migration. Many Hadoop systems were built as data lakes and analytics systems, and many of them failed. It has been suggested, by Gartner no less, that only one in five ever got into production and not all of those “successful” ones were that successful. Examples abound of cost and complexity far outweighing the benefits.

This incidentally is the Achilles heel of “fashionably cheap software.” For most companies the cost of system development is just a fraction (5% to 10%) of the cost of a system over its lifetime. (If you want to know the situation in your own company just compare the annual software development budget with the whole IT budget.)

The license cost of the software is, in turn, a small part of software development costs. Thus even if you get software for nothing it rarely saves much in the grand scheme of things. What happened with Hadoop was a classic example of this. Anyone who thought that Hadoop could barge into the corporate market with just a few years' development behind it and magically acquire robust production qualities was dreaming hard.

The pain among the Hadoop systems that actually made it to production came from complexity, maintainability, and lack of robustness. Teradata is offering to fix this through migration to Teradata Vantage. It’s about being cost-effective and not being constrained from building new analytic applications on the data.

The Irony

I will not go into detail about the migration program if you are interested you can find the details here on the Teradata web site. Instead, I’ll just point out that if you Google “Hadoop migration” you will tend to find links that discuss how to migrate data away from the big database companies and you’ll also find that those links are old, dating back to 2015 when Hadoop was high fashion. If you Google “is Hadoop dead?” you’ll be presented with more recent links from this year or last. Perhaps from here on when you Google “Hadoop migration” next year you’ll encounter links to Teradata and other Big Data Analytics companies that are piling in on the broken promises of Hadoop.

The dinosaurs are back, and they’re angry.

Robin Bloor Ph D. is the Technology Evangelist for Permission.io, author of The “Common Sense” of Crypto Currency, cofounder of The Bloor Group and webmaster of TheDataRightsofMan.com.

--

--

Robin Bloor
Bloor Group

is a technology analyts with a 30 year pedigree. He is also a frequent blogger, a published author and an advisor for Permission.io,