For The Want Of A Nail — Part 5 Of 5: Enabling AI For Organizations Of All Sizes

Liran Zvibel
Weka.IO
Published in
4 min readMar 2, 2018

Over my past few blogs I’ve discussed how Artificial Intelligence (AI) is all around us and its potential to unleash latent knowledge deep within your organization’s data stores. It does not matter what industry your business is in, AI can help.

If you want to take full advantage of the business transformation that AI affords, then you’ll need to think differently about your infrastructure. AI and learning applications (deep learning and machine learning) require massive amounts of compute power, network bandwidth, and fast AI storage. Fortunately, GPU based servers are ideally suited for AI and learning type applications. Infiniband is well suited to deliver extremely low latency and high network bandwidth, and parallel file systems make all the server centric storage and data shareable.

However, not all parallel file systems are created equal. In fact, the traditional file systems used in high performance computing (Lustre and Spectrum Scale) were not designed to take advantage of the performance and low latency of NVMe flash. This is important because AI is one of the most demanding workloads today; it consists of both large and small files, random and sequential access, and structured and unstructured data. AI applications are also very metadata intensive, so the file system must be able to consistently deliver very high metadata performance — not an easy task. For these legacy file systems to perform, AI systems must be over-engineered and augmented with large caching devices to provide decent small file and metadata performance. The result is an overly expensive solution.

GPU servers are quite expensive because they can process data hundreds of times faster than a similar CPU based server. The table below from an article in The Next Platform illustrates this point well. Note the extreme difference between the performance of a Xeon CPU based server and that of Nvidia’s DGX-1 GPU server. This difference in performance puts a huge demand on the supporting network and storage infrastructure.

A GPU server consumes data at a rate of 3–4 gigabytes per second, so a 10-node GPU cluster requires an interconnect and storage system that can sustain 30–40 gigabytes per second. Such an infrastructure would be quite expensive using legacy storage solutions. However, it doesn’t have to be.

You can position your organization for the future while protecting your existing investments by taking a software-centric approach to AI, learning systems, and data management. WekaIO has developed a storage solution well-suited to AI that includes the world’s fastest file system. When coupled with an Infiniband network, it provides over 6 gigabytes per second of bandwidth per GPU server, more than enough performance for any AI application. In fact, this combination provides performance that is over 2x faster than a local file system with a direct attached all-flash array.

As a shareable file system, Matrix is also cloud native, meaning that you can easily burst your AI workloads to a Matrix enabled GPU cluster in AWS using the Snap-to-S3 feature. This allows you to eliminate the investment in a huge AI cluster. Simply spin up a GPU cluster in AWS on-demand. Matrix leverages S3 compatible object storage to cost-effectively scale as your training data sets grow, and data management is point and click easy, or run your automated scripts using our CLI. A single admin without any special training can easily manage petabytes of data.

Overcoming the infrastructure challenge means that access to AI is no longer just for the big guys, but indeed within cost-effective reach for organizations like yours.

If this sounds intriguing to you, or if you’d just like to learn a bit more, I suggest you check out these resources or our website in general. You can learn in detail how WekaIO Matrix fundamentally changes AI and data management for the better. In addition, you can see real-world applications of WekaIO technology and how we partner with some of the leading supercomputer centers and server and networking vendors to build out an AI optimized solution. Better yet, if you are ready to embrace the future of AI, give us a call, we’d be happy to discuss your needs further.

You, too, can maximize your business potential through AI applications. You can do this with WekaIO’s Matrix, the fastest, most scalable file system storage for compute intensive applications. WekaIO: Intelligence Accelerated. Thanks for reading.

--

--