It’s a Trap! — Cloud Financial Incentive for Badly Optimized Analytics Software

Paige Roberts
5 min readOct 15, 2021

--

For all the years I’ve been working with data management and analytics software, there’s always been a powerful motivation to be as efficient as possible. The smarter your software is about using available computer resources — hardware, disk, memory, CPU… — the bigger your edge over the competition. The happier your customers are, the more money your company makes. The financial incentive to be more and more performant on less and less compute has always been enough to motivate endless tweaks to eke out just a little more speed, or figure out ways to do just a little bit more with the same hardware.

This benefits the customer, who constantly gets better and better software.

Then the cloud came along, and things seemed the same, for the most part. You could no longer say “hardware” to mean the storage and compute infrastructure, but I still assumed everyone in the data management and analytics software industry was in that same race, to be more and more performant on less and less compute “infrastructure.”

Recently, I did a webcast with a Vertica customer, Catch Media, who was pulling some of their workloads off the cloud (Cloud Repatriation is the term for that, now.) When asked why, the CTO said that for what they were paying for cloud, “I could re-buy the hardware every three months.” He showed me the numbers, and I was stunned. I don’t usually deal with the numbers side of things, and it blew me away the huge price tag they were paying for the convenience of cloud.

It occurred to me that a lot of software on the cloud is sold bundled with the infrastructure. That’s generally supplied by giants like Microsoft, Amazon, or Google these days. Sometimes, it’s also them supplying the software, sometimes it’s another company paying them for the infrastructure, and then re-selling it at a marked-up price bundled with that company’s software.

There’s a fundamental problem with this. The financial incentive is backwards.

The more compute infrastructure most cloud software uses to do a job, the more money both the software company and the infrastructure company make.

That’s not to the advantage of the customer … not even a little bit.

SaaS software that is easy to set up, easy to administer, easy to pay for with one monthly bill for only the hardware and software you actually used, all sounds wonderful. Consider that a lot of data is born on the cloud nowadays, and public clouds are reliable, reasonably secure, and — you have a cloud computing boom. These are good reasons to go the cloud, but they’re not what everyone seems to be shouting about.

The big message we all hear is: pay for only the infrastructure you use!

It sounds like a huge cost savings. It sounds like the thing everyone should do, and heck, maybe it even is right now, but once you’re in the cloud, the door is locked behind you with egress fees and lock-in on services and often analytic databases that don’t work anywhere else.

But if there’s no financial incentive for those databases to use infrastructure efficiently. If the cloud database company actually makes more money when customers use more infrastructure … what is the future going to look like? Most cloud analytical databases started out as software designed to be on-premises analytical databases, all in that race to be the most efficient. Amazon Redshift started out as the same code base as Paraccel for instance. And Paraccel was optimized to the max to be performant. It required a ton of finicky tuning like a race car, but then it used the hardware like a race car uses the track. Put that on the cloud where someone else worries about the tuning part, and you get Amazon Redshift, a performant cloud database. Now.

But what financial incentive does Amazon have to pay engineers to spend their valuable time making it more and more efficient and performant over time? Amazon sells the infrastructure. The more infrastructure they sell, the more money they make. Since you only pay for infrastructure you use, if the software uses more infrastructure to do the same job, they make more money. If they make the software more efficient, they make less money.

What about something like Snowflake that works on multiple clouds? They’re not an infrastructure provider, but they do negotiate with the cloud provider to get the infrastructure relatively cheap, then mark it up and resell it back to the customer, so they make a profit, not just on the software, but on every bit of infrastructure you use. Oh, and to make matters more interesting, if you use a lot of Snowflake software, they helpfully auto scale up the infrastructure, so you use more of that, too. This can lead to some very unpleasant surprises for CFO’s when the bill comes due.

To be clear, I would never advise anyone NOT to go to the cloud, or use SaaS software. Frankly, there are too many advantages to doing analytics on the cloud, assuming your cost sheet doesn’t look like Catch Media’s. But my mantra has always been: plan for changes because changes always happen whether you plan for them or not. These days, I use DOFOFU (acronym stolen with permission from @_ColinFay). Don’t F Over Future You.

With a financial disincentive to be performant, I have to wonder how efficient cloud analytical databases are going to be in two years, or five years, or ten. Be careful, and make sure that you’re not getting locked into something that’s headed in a direction you don’t want to go. To quote Admiral Akbar, “It’s a trap!”

The best option would be a SaaS analytical database that only sold the software, not bundled with the infrastructure. That would provide the database software company essential continued financial incentive to focus on optimal utilization of available infrastructure. Plus, it would let the customer know what they’re spending their money on. That is the only way I know of to make sure that the vendors’ desires for increased income is correctly lined up with the increased performance and infrastructure efficiency that the customer wants.

--

--

Paige Roberts

27 yrs in data mgmt: engineer, trainer, PM, PMM, consultant. Co-Author of O’Reilly’s : "Accelerate Machine Learning" “97 Things Every Data Engineer Should Know”