Architectures to Support Artificial Intelligence in Life Sciences

Dave Hiatt
Weka.IO
Published in
3 min readOct 1, 2017

In a recent BioIT World article, I wrote about the many changes and challenges affecting Life Sciences. Scientific discovery is increasingly dependent on analytics and data — lots of data. Artificial intelligence (AI) and machine learning are needed to mine deep insights from the disparate data sets. I stated that input/output (I/O) latency, time spent waiting for data before processing can continue, was a key inhibitor to discovery and that the trend was to tightly integrate compute, network, and storage into a single converged appliance. I’d like to expand on this idea because not all converged systems and parallel file systems are the same.

Converged systems overcome many of the limitations of traditional storage systems because the core function of storing data has been moved closer to the data, and the systems are designed to scale out in capacity and scale up in performance. IDC separates the converged infrastructure (CI) market into three distinct market segments:

  • Integrated systems are pre-integrated, vendor-certified systems containing server hardware, disk storage systems, networking equipment, and basic element/systems management software.”

Examples of integrated systems include Dell/EMC’s VBlock and Oracle’s Exadata platforms. These systems are highly optimized and work very well for specific applications, however, the tight integration limits their flexibility and scalability. In addition, they are quite expensive, all-in costs can top the million-dollar mark, and you are locked into one hardware vendor.

  • “…certified reference systems are pre-integrated, vendor-certified systems containing server hardware, disk storage systems, networking equipment, and basic element/systems management software. Certified reference systems, however, are designed with systems from multiple technology vendors.”

A variation on single vendor Integrated Systems is a Certified Reference System. This architecture is available from several vendors typically as detailed instruction guides for building storage systems. This approach is not much different than assembling the system yourself from an array of components. Although the configurations are tested and certified, challenges include limited component choice, coordinating support amongst technology suppliers, and integrating and tuning the various components.

  • Hyperconverged systems collapse core storage and compute functionality into a single, highly virtualized solution. A key characteristic of hyperconverged systems that differentiate these solutions from other integrated systems is their ability to provide all compute and storage functions through the same server-based resources.”

A very important difference between integrated hardware systems and hyperconverged software defined systems is that hyperconverged systems can be used without specialized hardware. Key advantages of this architecture are that data resides within the compute infrastructure and it’s flexible enough to easily accommodate a variety of workloads on a shared platform, minimizing network traffic, eliminating the need for duplicate data copies, and reducing overall cost and time to results. This is especially important for data intensive applications such as artificial intelligence and machine learning.

Converged Infrastructure Adoption

How big is the converged systems market? Market research firm IDC reports that sales of all types of converged systems grew 6.2% in the second quarter of 2017, generating $3.15B in annual sales. System capacity grew by over 5.6% in that same time-period, consuming 1.78 exabytes of new storage capacity. The combined revenues generated by integrated infrastructure and certified reference systems declined by 1.5% compared to the second quarter last year. In contrast, hyperconverged systems grew 48% to $763.4 million. Gartner believes that by 2019, approximately 30% of the global storage array capacity installed in enterprise data centers will be deployed on converged systems. Currently, the installed base is less than 5 percent. The top reasons organizations cite for adopting CI is improving the agility of storage, reducing administrative costs, and reducing capital costs by leveraging standard hardware infrastructure.

The most advanced software-only solutions such as WekaIO’s Matrix further reduce costs by allowing the use of commodity hardware (servers and SSDs) to scale performance and capacity independently and dynamically for maximum platform flexibility.

What Does This Mean for Life Sciences?

Life sciences generates a wide variety of data with correspondingly diverse workloads. AI and machine learning are the keys to unlocking the secrets contained in these data sets that will lead to a healthier future for all of us. To realize this promise, organizations should consider hyperconverged infrastructure based on software defined storage and commodity hardware that offers the flexibility and performance to support AI.

--

--