Cloudera and Hortonworks are merging to compete in a new data world

Published in

ARCHITECHT

5 min readOct 5, 2018

This originally appeared in the ARCHITECHT newsletter on Oct. 3, 2018. Click here to read the full issue and sign up to get new issues delivered to your inbox.

I wasn’t even planning to write this week, because travel, but then I arrived at my hotel and saw the news of the Cloudera-Hortonworks $5.2 billion merger. And I felt compelled to write something, especially considering how closely I used to cover these two companies and the Hadoop space overall. So, here’s my brief and disjointed take.

There were a lot of news stories written on the news that Cloudera and Hortonworks are planning to merge, and you can probably get generally the same details from reading any of them. But for background, I’ll point you to articles by my former colleagues Jordan Novet and Tom Krazit:

And, for good measure, you can also check out the Cloudera blog post, Hacker News discussion, and this short but sweet “big data obituary” from Gartner.

I think anyone not directly involved with this deal who says they aren’t surprised is lying. Sure, a merger like this actually makes quite a bit of sense when you think about it, but these are two companies that spent years as mortal enemies. (A colleague reminded me of this piece I wrote on their rivalry back in 2011.) The idea that they would unite as a single company is definitely surprising — and would have been unthinkable even a couple years ago.

But it does make sense. The Hadoop market never panned out like so many people thought it would, which left the companies in it (1) running away from the “Hadoop vendor” label and (2) searching for business lines to validate all their investment in that technology stack. Cloudera seemed to target the data warehouse and data science/machine learning side of things, while I think Hortonworks was doing some interesting things around the internet of things and edge computing. Now they can work on bringing all this stuff together into an entity that stands a better chance of surviving — and even thriving — in an IT industry that’s much different than when Hadoop hit the scene a decade ago.

The world probably didn’t need two companies each working off a similar base, but also supporting their own technologies (open source or not) around security, storage engines, analytics, governance and the like. Apparently, the powers that be at Cloudera and Hortonworks were wise enough (and mature enough) to see this and do the right thing. It will be very interesting to see how they bring their various technologies together, a project they acknowledge will take a few years.

Business-wise, the issue isn’t so much revenue (both companies have been seeing nice gains) as it is losses (both have still been losing money every quarter).

People are correctly asking what this merger means for MapR — the third of the original Hadoop platform companies — and I honestly don’t know. MapR was always ahead of the curve in terms of pushing the open source business model and innovating around file systems, databases, containers, etc., but you have to imagine it’s suffering from the same factors that forced the merger of its two bigger, publicly traded rivals. There used to be talk of a MapR IPO; I haven’t heard enough of that company recently to get any sense of whether that’s still a real possibility.

What are the factors that made the Hadoop/big data/whatever market so difficult in the end? Here are a handful, but they’re all related:

Artificial intelligence / machine learning
Cloud computing (including storage, managed services and open source activity)
Spark
Kubernetes
Other open source projects (probably including, but not limited to Elastic, Kafka, Flink and any number of databases)

Basically, much the world moved on from heavy data-infrastructure projects and wanted to do things faster, easier and cheaper. The Hadoop ecosystem was an able flag-bearer for big data in its early days, but other projects and whole new industries (AI, for example) were able to evolve on their own outside of Hadoop, and then start integrating with one another because that’s how open source works.

The result are whole new data architectures, application architectures, development processes and user expectations, most of which Cloudera and Hortonworks weren’t really in any position to influence. They have adapted and integrated where necessary (Spark, Kubernetes, TensorFlow, etc.), but it seems like a perpetual game of catching up to massive cloud providers on one hand and fast-moving open source communities on the other.

There’s probably also an angle here about the amount of capital these companies raised, but I’m not going to dive into it. Except to note that there are also a bunch of database companies (including in the NoSQL space) that have raised significant funding over the past several years and are possibly struggling to find a satisfactory exit. It wouldn’t surprise me to see the new Cloudera/Hortonworks do an acquisition or two here to flesh out the full data-management story, or even to see companies in the database space do their own mergers in order to better compete.

Finally, these two ARCHITECHT Show podcast episodes seem relevant today:

Hortonworks CEO on the shifting nature of big data, from edge computing to GDPR

Rob Bearden discusses the changing nature of big data technologies and business models, spurred by cloud, IoT and even…

architecht.io

Cloudera co-founder on the future of big data: AI, IoT and cloud computing

Plus, a big, scary week full of reminders that security and software architectures still matter in the cloud.