Locked in a box: Machine learning without cloud or APIs

APIs are great. Machine learning APIs are even more awesome. From api.ai to the Google Cloud Platform APIs like Google vision, and IBM’s analytics tools, there is some serious muscle available out there. When calling an API, the need for a cloud-based instance (g or p instance on AWS), or an on-premise GPU in a box. You just call the API from your instance and like magic you get results. Sometimes your own GPU server in the cloud makes sense, if you are doing something custom that an API does not offer.

HOWEVER…

I have had several clients this year that do not want to call APIs at all. Some would not even ship the data off premise to AWS or Azure. Sometimes their data is too sensitive to share at all. For example, it may be classified. In other cases they may be obligated to keep the data in a certain country, and the API is not in their country. Lastly they may be hesitant to let go of their infrastructure investment. Edge computing (saving money by running the a.i. inside client machines like phones) is not something I have run into yet on a contract. And the cost savings of doing this stuff on premise versus on API calls and AWS are minor for most of our clients. The nightmare scenario for me is deploying AI into a box with no internet. Doable, but so incredibly not fun. You run apt-get update and it just hangs…..

Why are these businesses avoiding APIs? Well, some clients simply don’t want to pay the fees for API calls and want to host everything inside their facility as a secondary concern. I always advise against this approach where a solution could be available. I advocate for cloud solutions. The primary concerns I hear, as mentioned above, are regulatory or contractual, not financial.

After fighting the good fight to overcome these barriers to cloud/APIs, those companies that can’t or won’t end up provisioning a nice shiny new server. Sometimes with hyperV for taking snapshots. The system needs lots of RAM, WITH LOW LATENCY. Low latency means the 4 timing numbers on the purchasing page. They are not usually printed on the sticks of RAM. Here is a link to more info on RAM timing.

At the time I’m writing this, decent RAM specs for a good price and a cheap motherboard are DDR3 1600 with timing 7-8-8-24. Features in DDR4/5/6 are actually really nice. The number of channels matters too. This is a whole topic unto itself. The key here is to have at least 32 GB. Size matters. Models like word2vec eat RAM like you would not believe, and loading and storing data, especially image processing data, can be heavy on the RAM and the system bus.

Speaking of the system bus, you also need to have a motherboard with high speed. Obviously, it almost goes without saying that you need SSDs for your hard drives. I’ll leave it at that.

This rig also needs a strong GPU. Think $10,000–20,000 for one card. Obviously Nvidia. Until the VOLTA comes out the best stuff is Tesla. P100 to K40/K80 and others are the sort of GPU you want. Obviously it depends on what you want to do and how much you want to spend. Sometimes multiple GPUs and/or multiple machines are needed. Sometimes but not always, you want a strong CPU like an i7 to complement the GPU computing. This happens when the application already needs lots of CPU, or sometimes there are nice operations live AVX that make sense to leverage in CPU. The GCC compiler targets these instructions for you, and I have been involved in a Java project that uses this type of Single Instruction Multiple Data (SIMD) instruction. Lots of software needs a CPU (e.g. multithreading, hyper-V, …) so an i7/XEON is a good idea to have on your side. If you are going with a server rack, and money is not tight, why not just spend the $30,000 and buy a server. I myself went with a PC-based box for our dev environment, for way less than 30 thousand, but that’s just me.

OK. Let’s get back on track. Next, you need to install lots of frameworks and tools that your project will need. Tensorflow and keras, word2vec and glove. Lots and lots of whatever you need. This would be an AMI on AWS… Faceplant.

I always try and set these systems up with python 2 AND 3 versions of the toolchains, just in case you need either one. After all, this is infrastructure. Some models only work out of the box in python 2. Don’t start with the python subversion thing. 2.7, 3.4, 3.5, 3.6 … Just … Don’t.

I tend to build these systems on Ubuntu server unless the client want something special like CentOS. I also put in LEMP and some other server stuff to communicate with the rest of the client’s infrastructure. Usually this AI machine is one small part serving some larger purpose. Those other internal systems think of this server as an internal service where they submit jobs. It would be great if all the core company code was “AI first,” but most often existing companies want to add AI as a capability rather than overturning the apple cart and rewriting their code.

Once the basics are up and running we need to find machine learning libraries to replace the APIs we would have called. One tiny example is replacing Google vision with a tensorflow CNN like VGG16, inception, squeezenet, etc.

Well, hopefully this post on on-premise machine learning infrastructure setup has been informative.

Happy coding!

-Daniel
daniel@lemay.ai ← Say hi.
Lemay.ai
1(855)LEMAY-AI

Other articles you may enjoy: