Are there ANY Good Data Center Storage Hardware Opportunities Left?

Recently I have had time to get reacquainted with a number of friends in venture capital and private equity, most of whom have multiple investments in enterprise storage. Because I have spent so much time in and around the storage industry they frequently ask if I have an opinion on the market and what's next in storage. These folks are concerned about their current investments and almost all of them are pulling back from doing more in storage this year.

The truth is, since I finished working with HP on the 3PAR acquisition and subsequent conversion of that platform to address the mid-range and flash I haven’t spent much time on primary storage. I have been working on software and cloud, including backup and recovery, but haven’t been invo!ved with the hardware part of the business for a few years. I don’t know what the next big thing is, but a couple thoughts have occurred to me. These probably fall into the category of "not new ideas", but they are new to me because I haven’t been plugged in for a while.

A few years ago when DSSD came out of stealth mode and was then acquired by EMC, I was convinced something like that was the next big thing. I still am. In storage, as in cars, speed kills. Tier 1 storage is about many things but top of the list these days is latency. Since Fusion IO, Violin and Texas Memory hit the scene it has been cool to see the use of flash expand from a small set of applications to basically everything. We have gone from putting these components in existing systems to optimizing everything for high performance, low latency, and new use cases. The hardware roadmap is full, with 3D flash and other future solid state types coming over the next few years. As NVMe drives and the new storage devices roll into more mainstream storage array platforms, companies are talking about latency in the low hundreds of microseconds or less! Crazy times. There is a big question about whether there are enough apps for this class of storage. If there aren’t now, there will be, as AI, analytics and IoT ramp up. It’s not going to be just high frequency trading cats in New York for very long. I think it will get huge like flash did.

There seem to be different approaches to this. For my money, large centralized solid-state arrays with NVMe over fabric seems the most compelling. This would allow you to not put caching devices in the servers, reducing the cost per server, which is what the hyperscale cloud datacenter folks would want. An approach that requires a catching device in each server seems counter productive over time, though I recognize that there are significant benefits also. Nobody has really succeeded in building a successful server catching card company and so I like the no-card software approaches better. Software that can optimize over the fabric without adding a server cache is valuable and I remember that there were people working on stuff like this a few years ago, but thinking about it in the context of applications like virtual desktops. One thing I would keep an eye out for in this context are storage device form factors with a lot more capacity per drive than we see right now, and I know of one company that is doing work in this area. Perhaps we will see exabyte capacity centralized solid state storage arrays supporting lots of servers at sub hundred microsecond latency over the next few years.

Which begs the age old question... how’re ya gonna back that up?? Good luck. One thing is for sure: the backup appliance space is overdue for disruption. Purpose built backup appliances, especially Data Domain boxes, revolutionized backup and recovery by delivering in-line and some post-process data reduction capabilities such as deduplication and different types of compression. Eight or ten years ago the reason this took off was that while hard disk could enable faster restores than tape, SATA drives were still relatively expensive and backup appliances were capacity limited. Along came virtual tape libraries, and then deduplication appliances, and customers bought them and everybody was happy. But the last time I looked at a big data center that was using this technology they had huge numbers of these dedicated appliances, and were adding more all the time. These machines are still capacity and performance limited which is why there are so many of them being installed in these large data centers to handle the load. The complexity of managing all these devices has become a problem in itself. In the mean time, drives have come down in price and grown in capacity and source side deduplication has improved. The additional complexity of managing all these appliances is no longer as justifiable, but it works, and IT managers don’t want to go through the hell of ripping all that stuff out and replacing it. So they keep buying more appliances every quarter.

Don’t get me wrong, it’s not a bad approach, and there is a lot of complexity around well architected backup systems that I am not going to (and am not qualified to) get into here. But what occurs to me is that a lot of customers could use a large object store with a single namespace as the backup repository, scale it forever, and save a ton of money. There are limitations with this to work around, and you would have to manage the amount of duplicate data that actually arrives at the object store by using source side dedupe (or something as an intermediary step), but the advantage of an object store is it could be really really large and really cheap. Perhaps only full backups or snapshots would be stored here and the approach to managing incremental backups would need to be thought through. It could also be cloud compatible. Imagine a big encrypted Amazon S3 compatible object store, that held snapshots and archives that could be pushed off to Amazon over time. Imagine doing your restores from these snapshots and archives to new instances in Amazon and effectively migrating the data to the cloud through the course of the backup process. I’m curious as to the cost of actually not worrying about deduplication quite so much and just storing the data. If you went this route you would have to think through how a billion things like incrementals, deltas and all that backup witchcraft people have built up for years would need to change. But change is good, simple is better, and maybe its time for the "private backup cloud".

OK. Enough hardware talk....