What comes after Hyperconverged? Superconverged!

Published in

Jai Menon’s Blog

12 min readNov 28, 2016

This blog argues that a new form of IT infrastructure, called superconverged, that will make it even easier for enterprise customers to consume IT, will emerge shortly.

1. Convergence Background

The first generation of IT infrastructures were “siloed” 3-tier IT architectures where compute, storage and networking were selected and bought separately by the customer. The onus was on them to make the 3 products work together. Starting from 2007, a second generation of converged infrastructure or CI systems emerged that bundled existing server, storage and networking products, together with management software. This approach reduced from months to weeks the time it took a customer to run their first application following purchase, as the vendor had pre-tested and pre-integrated the 3 products. In both the siloed and converged architectures, SANs were used for storage (to keep this blog simple, I am avoid discussion of NAS). Hyperconverged or HCI systems emerged in 2012 — these were tightly coupled compute and storage hardware that eschewed the need for a SAN. Unlike CI, which is really 3 pre-tested products sold together, HCI is truly a single product with unified management. Think of HCI systems as the 3rd generation of convergence.

HCI systems simplified how IT is consumed and has rightly generated a lot of interest among customers. According to 451 Research’s latest quarterly Voice of the Enterprise survey of IT buyers, hyperconverged infrastructure is currently in use at 40% of organizations, and 451 Research analysts expect that number to rise substantially over the next two years[1].

In this blog I am trying to answer the question “What comes after hyperconverged”? My goal is to try and make a prediction about the future. I do so by both looking at the past and by examining emerging technologies. I conclude that a 4th generation of architecture called superconverged is imminent.

But first, some history.

2. The SAN Era — 1994–2012

A storage area network (SAN) is a network that provides access from multiple servers in a data center to shared block storage, such as a disk array. It enables campus-wide consolidation of high-throughput storage.

SANs became popular when the serial, Fibre Channel (FC) network standard emerged in 1994 and 1 Gbps FC became available in 1997. Direct-Attached Storage (DAS) was an alternative available at the time, but it did not support storage sharing, leading to poor storage utilization.

SANs allowed for better storage utilization, higher reliability, high performance storage sharing and simplified storage management, through use of a common set of storage functions available to all the servers in a data center. Furthermore, SANs allowed for the independent scaling of storage and compute resources, making it possible to run a wide variety of workloads with widely different storage to compute requirements.

The performance of FC SAN networks continued to improve with time. The cost and complexity of FC SANs dropped in the early 2000s to levels allowing for even wider adoption across both enterprise and small to medium-sized business environments. New capabilities such as snapshots, clones, compression, de-duplication, storage tiering and remote copy were introduced rapidly into shared SAN storage and were often unavailable for DAS storage. iSCSI SANs and FCoE (FC over Ethernet) emerged as alternatives to FC SANs. They allowed Ethernet to be used for storage, thus simplifying management through elimination of a second type of network dedicated to storage. Due to these several developments, shared SAN storage became very popular, at the expense of DAS storage, and this was the state of affairs until about 2012.

3. The DAS and Hyperconverged Era — 2012–2016+

In the last few years, however, the pendulum has started swinging back in favor of DAS due to the following four problems with SANs.

1. SANs are slow. As faster and faster storage devices have emerged (Flash, SSDs, etc.), the network protocol and software overhead of accessing these storage devices over a SAN network began to dominate the total response time. DAS access times can be anywhere from one half to one third of SAN access times.

2. SANs are expensive. Storage devices in SAN arrays were being priced significantly higher than storage devices in DAS storage, even when the storage devices used were identical.

3. SANs are proprietary. With some exceptions, SAN storage is generally proprietary.

4. SANs are complex to manage. SANs based on the Fibre Channel (FC) standard, require unique host bus adapters (HBAs), unique switches, unique cables, LUN zoning, LUN masking, and new multi-pathing drivers. SANs based on Ethernet, such as iSCSI SANs, though not requiring a different type of network for storage, are still somewhat complex to manage. Much of the SAN complexity springs from (a) the disconnect between application concepts like virtual machines (VMs) and storage concepts like LUNs and (b) the separate management of compute and storage.

Because of these problems with SANs, new storage software emerged, called virtual SAN or vSAN software. As the name implies, vSAN software eliminates SANs and provides equivalent storage sharing capability.

Hyperconverged or HCI systems are tightly coupled compute and storage nodes that eliminate the need for a regular storage area network (SAN) and use vSAN software to achieve equivalent functionality, using the existing server to server network and DAS storage. The combination of DAS storage and vSAN software on application servers is now priced lower than SAN storage, while providing roughly equivalent capability. Storage functions — plus optional capabilities like backup, recovery, replication, de-duplication and compression — are delivered via software in the server nodes. Examples of HCI systems include Nutanix, Simplivity, Cisco Hyperflex, Pivot3 and Scale Computing.

Compared to FC SANs, HCI systems do not need new HBAs, cables or switches. But, like all SANs, HCI systems do need to access remote storage (on another node) over a network. When an application reads data, the data may be stored on the local DAS, or it may be on one or more of the remote nodes. In the latter case, data needs to be fetched from one or more remote storage devices over the network. Similarly, when an application writes data, a local copy can be written to the local DAS, however a second copy must always traverse a network to be written to remote storage on some other node. HCI systems try to improve performance by placing data on the DAS storage of the server that is most likely to access that data — a technique called data locality.

HCI systems simplify storage management by providing application-centric storage management functions (Virtual machine (VM) snapshots versus LUN snapshots) and by providing integrated management of compute and storage.

4. New developments in storage & SAN protocols

In this section, we review 3 recent developments with storage and SAN protocols. These point the way to what will likely emerge after hyperconverged.

4.1 NVMe over Fabrics (NVMeoF) replacing SCSI

Small Computer System Interface (SCSI) became a standard for connecting and transferring data between a host and a target block storage device or system in 1986, when hard disk drives (HDDs) and tape were the primary storage media. NVMe, a newer alternative to SCSI, is designed for use with faster media, such as solid-state drives (SSDs) and post-flash memory-based technologies.

NVMe provides a streamlined register interface and command set to reduce the I/O stack’s CPU overhead. Benefits of NVMe-based storage devices include lower latency, more parallelism, and higher performance.

NVMe was originally designed for local use over a computer’s PCIe bus. NVMe over Fabrics (NVMeoF) enables the use of alternate transports that extend the distance over which an NVMe host device and an NVMe storage drive or subsystem can connect. The design goal for NVMeoF is to make the latency to a remote storage device indistinguishable from the latency to a locally attached NVMe storage device.

SAN networks have traditionally used the SCSI protocol over FC transport. With the release of the NVMeoF specification in June 2016, future SANs will use the NVMeoF protocol over transports such as Ethernet with RDMA (iWARP or RoCE). Mangstor has already announced availability of the NX-Series Flash arrays that support 3 million iops and under 100 usecs latency using NVMeoF[2]. Mellanox[3] and Qlogic[4] have also demonstrated NVMeoF solutions at the Intel Developer Forum in August 2016.

With these emerging SANs, the performance difference between SANs and DAS will effectively disappear.

4.2 Emergence of Competitive Networked Storage on Standard x86 Servers

In the past, SAN-attached shared block storage controllers used to be proprietary. They were built using special motherboards and sometimes even with special ASICs, as in the case of 3Par.

Increasingly, shared block storage is built using standard x86 servers and these systems are competitive in cost, performance and scalability to shared block storage built using proprietary hardware.

Two kinds of block controllers are being routinely built using standard x86 servers — scale-out storage controllers and dual controller RAID arrays. The former uses a minimum of three x86 servers and replication for redundancy, the latter uses two x86 servers in industry standard form factors like Storage Bridge Bay (SBB) and/or PCIe switching, and uses RAID for redundancy.

4.3 Emergence of Application-centric storage management in networked shared storage

SAN storage of the past provided LUN-centric storage management. Increasingly, networked SAN storage vendors (e.g. Tintri[5]) are providing application-centric storage management.

4.4 Summary

These three developments show that future networked block storage can

· be built with standard server hardware

· have simple application-centric management, and

· have high performance using the NVMeoF protocol.

In my opinion, these three developments will significantly impact the storage and converged architectures of the future.

5. The Next-Gen Storage Network Era — 2016 — ?

Let us return and examine the 4 key reasons that swung the pendulum away from SANs and towards DAS and hyperconverged.

1. DAS access times were one-half to one-third of SAN access times. This statement is no longer true with the advent of NVMeoF.

2. SAN storage devices are more expensive than DAS storage devices. There are now many SAN arrays that are nothing but software running on standard x86 servers with standard Ethernet networks and storage devices. This is often referred to as software-defined storage (SDS). As these are standard x86 servers, the storage devices attached to them are no more expensive than those used in DAS. So, this argument no longer applies.

3. SAN storage is proprietary. With software-defined storage as described in the previous bullet, SAN storage built on standard x86 servers is no longer proprietary.

4. SANs are complex to manage. This will no longer be true for the networked block storage of the future. It is now possible to build networked block storage using standard x86 servers as storage controllers connected to compute servers by high-performance Ethernet, to use virtual networking to eliminate the need for LUN zoning and masking, and to use standard Linux MPIO drivers to achieve high availability access to the shared storage. Furthermore, such storage can easily provide application-centric storage management as proven by Tintri[5] and others. Such storage can also be integrated with compute servers in converged offerings that allow for integrated management of compute and storage.

To summarize, networked block storage of the future will be based on Ethernet with RDMA networks (not FC), use the NVMeoF protocol (not SCSI), and use virtual networking to eliminate zoning and LUN masking.

Since the word SAN is often associated with FC in many people’s minds, and it is also associated with concepts like zoning and LUN masking, I will not use the word SAN to refer to networked block storage of the future.

6. Next-Generation Convergence Era — 2016 — ?

As I said before, Hyperconverged or HCI systems emerged in 2012 — as the 3rd generation of convergence.

In my view, a 4th generation of convergence will emerge that will utilize the next generation of networked block storage we described in the previous section. Such systems, which we call superconverged or SCI systems, will integrate servers and networked block storage, and connect them using a next-generation storage network[6]. Like HCI, SCI will be built and shipped as a single product with one management console. The same RDMA Ethernet network will be used to connect compute servers to each other and to connect compute servers to x86-based storage servers. Unlike in HCI systems, compute servers will run only customer applications, storage functions will run separately in the networked block storage built with x86 servers.

In particular, we examine superconverged systems that use x86-based storage servers designed as dual controller RAID arrays. Multiples of such dual-controller RAID arrays (each with 2 x86 servers) can be part of a superconverged system, and every server connects to every RAID array. Unlike in hyperconverged systems, the x86 servers used to run storage functions in SCI can be different than the x86 servers used as computer servers. In particular, they can have different DRAM, cores and NICs, and be optimized for storage performance. Like hyperconverged systems, storage management will be application-centric, and a single console will provide integrated management of all compute, network and storage.

This blog focuses exclusively on how SCI systems are different from HCI systems in their storage implementation. In a future blog, I plan to discuss how SCI systems are different in their network implementation from HCI systems[6].

Based on the above analysis, it is clear that future superconverged systems will no longer be deficient compared to hyperconverged systems, in the manner in which siloed or converged architectures with SAN storage were. To the contrary, hyperconverged systems that use DAS will be deficient to superconverged systems with shared block storage in the following ways.

1. DAS-based hyperconverged systems scale compute and storage in a coupled way — new nodes typically add both storage and compute resources in varying ratios. Superconverged systems allow independent scaling of storage and compute resources, similar to hyperscaled public cloud architectures. This allows for more precise matching of resource allocations to application requirements. The result is that superconverged systems will tolerate a wider diversity of workloads and support a greater degree of scaling.

2. DAS-based hyperconverged systems use Flash caching and data locality to improve performance. Data locality can be hard to maintain for future scale-out workloads, as their access patterns to data may be unpredictable and because some of the data may be shared and accessed from multiple nodes. Furthermore, when a workload migrates from one node to another, hyperconverged systems will need to migrate data from the original node to the new node, to ensure good performance. This is unnecessary for superconverged systems that use high-performance NVMeoF, as storage performance will be the same from any node. Furthermore, host Flash caching is unnecessary in superconverged systems, as the latency difference between host Flash access and networked Flash access is negligible. The complexity of developing caching and data locality software, and any resulting coding bugs, can be eliminated in superconverged systems.

3. Unlike HCI, SCI systems can optimize storage performance by using x86 storage servers that are optimized for running storage functions.

4. HCI systems provide high availability storage in a less efficient way. All existing HCI Systems use either replication (which is expensive relative to RAID) or they use erasure coding (which generally has performance somewhat inferior to equivalently implemented RAID[7] as they employ more computationally expensive and more memory intensive functions than the simple parity used in RAID)[8]. Superconverged systems can implement high performing RAID to provide efficient high availability storage.

5. HCI Systems generally find it more difficult to guarantee application performance, since compute resources needed to run customer applications must compete with compute resources needed to run the vSAN software on the same set of nodes, and since performance is so dependent on data locality.

6. SCI systems need to rebuild data only when a drive fails, and only the failed drive needs to be rebuilt. HCI systems need to rebuild data whenever a node fails, and all drives attached to that node have to be rebuilt. HCI rebuilds are therefore both more pervasive (many drives rebuilt versus one drive) and more invasive (impacts application performance more, as it runs on the same nodes where the customer apps run).

For these reasons, I conclude that future superconverged systems are likely to be superior to DAS-based hyperconverged systems.

7. Summary and Conclusions

Problems with SANs lead to the emergence of hyperconverged (HCI) systems. With recent technology developments such as NVMeoF, software-defined storage, dual-controller RAID arrays based on standard x86 servers and application-centric storage management, I argue that shared block storage systems of the immediate future will no longer suffer from the original SAN deficiencies. Furthermore, future integrated systems that include these next-generation storage systems, called superconverged systems, can eliminate several limitations of hyperconverged systems such as coupled scaling, need for data migration to maintain data locality, inefficient storage redundancy, inconsistent performance, rebuilding that is both invasive and pervasive, and so on.

My conclusion is that the best infrastructure for the future will be superconverged systems that are based on emerging storage networks that use NVMeoF. Hyperconverged systems will continue to play a role, particularly entry systems for SMB customers where ease of use is paramount and scalability is less of an issue, but we may well have seen the peak of the hype cycle on hyperconverged systems.

[1] http://www.benzinga.com/pressreleases/16/09/p8510155/according-to-new-451-research-survey-40-of-enterprises-are-using-hyperc

[2] http://enterprise.dpie.com/storage-san/mangstor-nx-series-flash-arrays

[3] http://www.newschannel10.com/story/32773631/mellanox-demonstrates-accelerated-nvme-over-fabrics-at-intel-developers-forum

[4] http://www.newson6.com/story/32760160/qlogic-demonstrates-array-of-nvm-express-over-fabrics-technologies-delivering-reduced-latency-and-cpu-utilization-for-flash-based-storage-at-idf-2016

[5] https://www.tintri.com/company

[6] We simplify the definition of superconverged systems in this paper. A fuller definition of superconverged is available from a Superconverged white paper I wrote, contact me if you are interested. A complete superconverged system also includes network switches, network virtualization and network function virtualization — so it has everything needed to run applications — servers, storage and networking.

[7] Modern distributed or declustered RAID can distribute rebuild across all drives, similar to erasure coding schemes, to reduce rebuild times.

[8] HCI systems can use RAID also, though none do in practice, likely because networked RAID on HCI systems can have performance issues.