Why Your Engineers Need an AI Sandbox

Why your manager should buy you a dedicated AI server

Kevin Dewalt

Published in

Actionable AI

4 min readAug 18, 2017

Part 3 of our 5-part series How to Build an AI Sandbox. Part 1, 2, 3, 4, 5.

Previously we defined an AI Sandbox:

The AI Sandbox is the development hardware, software, data, tools, interfaces, and policies necessary for starting an enterprise deep learning practice.

In this post I’ll explain why you need an AI Sandbox.

CTOs don’t want to stand up new environments

Your CTO is understandably reluctant to setup a new development environment. Every new database … server … toolset … increases infrastructure costs.

A good rule of thumb: data infrastructure maintenance costs are 10x the initial purchase costs.

For example, the 50 hours your developers spend in acquiring, configuring, and loading a data base will lead to 500 hours spent maintaining interfaces, updates, and security.

That’s why most IT executives won’t approve new infrastructures without a good justification. I’ve been on both sides of this argument many times in my career.

Unfortunately your engineers can’t do AI on their laptops

Building deep learning models takes a specialized infrastructure:

Large volumes of training data.
Specialized computers configured with NVIDIA GPUs.
Software interfaces to manage database refreshes.

Heat is a major problem in high-performance computing

Your engineers can’t do deep learning on a Macbook

They’ll waste days configuring tools only to discover:

It takes too long train algorithms. An operation which takes 1 hour on a graphics card from Best Buy takes 1 full week on a new MacBook Pro.
They can’t use the computer for anything else.
The CPU and fan run at 100% and annoy everyone else in the room.

Yes, I’ve talked to great engineerings in Fortune 1000 companies wasting time doing AI on their laptops because management doesn’t understand AI.

Your engineers need a dedicated AI Sandbox

Amazon AWS is just a starting point

Most AI engineers begin — as I did — by using Amazon’s EC2 GPUs as an AI Sandbox. This is a fine solution if you’re just learning or running a single high-performance task because you can quickly spin up a GPU instance.

AWS GPUs are too slow

You can find lots of price/performance benchmarks like this one from Vincent Chu.

Bottom line: you’ll get much faster performance building your own deep learning machine built with inexpensive consumer components like this.

AWS GPUs get expensive

You’re probably accustomed to very inexpensive computing and storage costs from Amazon. Unfortunately their GPU computing resources are not priced as favorably.

NVIDIA’s licensing models prohibit cloud providers like AWS from buying inexpensive, high-performance gaming GPUs and selling them at low costs. Even the lowest-tier pricing p2.xlarge — a relatively slow GPU — was starting to cost me hundreds of dollars a month.

Having complete platform control is more efficient for developers

I am far less efficient running tools Jupyter Notebook on Amazon GPUs. I experienced latency and random disconnects which crushed my efficiency.

AWS isn’t optimal for an AI Sandbox for the same reason your engineers still have a local development environment on their laptops.

Moreover, having complete control to customize access, optimize resource usage is key to optimizing development.

Consider storage for example. An AI Sandbox stores data on everything from L2 Cache to traditional SATA disk drives.

Even simple choices like store frequently-accessed results on the SSD can dramatically increase performance

Thanks Jeremy Howard for explaining this in fast.ai.

Most developers are obsessed with optimizing their own efficiency. Even mid-level developers can dramatically improve their efficiency by making simple choices about placing data on different parts of this stack.

My 500 GB NVME SSD is 2–10x faster than my 6 TB SATA HDD, so I only use the HDD for infrequently-accessed storage.

An AI engineer who works just 20% more efficiently is saving you about $40K/year.

Large datasets are needed for experimentation

In traditionally programming developers only need small, local databases built with tools like MySQL or Postgres.

In machine learning developers use data to train algorithms. Thus they need large, current datasets to continuously look for ways to improve algorithm performance. Their results directly depend on the quality and volume of training data.

They also need versions of training data as your environment changes through acquisitions and releases.

Your AI engineers need a dedicated AI Sandbox

AWS is fine if you’re just starting out. And if you’re building a world-class AI team go ahead and spend $500K on top-of-the-line infrastructure from NVIDIA.

But your average enterprise just starting in AI — banks, insurance companies, e-commerce sites — should get their developers a dedicated AI Sandbox.

In parts 3–5 I’ll explain how to buy, build, and configure one.