An Ode to the Terminal: The (Almost) Perfect Tool

Bringing Deep Learning to the Terminal with Node and Tensorflow.js

One of my favorite things about my work as a developer advocate is that I have the opportunity to work with lots of different technologies across all types of projects. This means lots of different tools and workflows, but there’s one tool I use constantly throughout the day for almost everything — the Terminal. There are plenty of options out there depending on what type of environment you’re running (my personal favorite for Mac is iTerm 2) but in my opinion you really can’t go wrong. Nothing feels quite as powerful as entering in commands at the command-line and watching my computer do exactly as I’ve requested, but even my favorite tool does have its limitations.

The filenames for these images don’t tell me anything about their content.

Conjuring a Solution

Working with large numbers of files can sometimes get a little confusing, especially if they aren’t labelled properly. Digging through log files, raw input data, or something like an SD card full of photos to find the one you want can be an arduous task. For text-based files there’s cat, a command that I frequently turn to for a quick look into a file’s contents… But what about other types of media, like images and audio files? Sure, you could just open the file and look at it, but that’s not always an option. What if you’re logged into a remote terminal, or using an OS that can’t open these types of media files? Once my curiosity gets going there’s really no stopping it, so I set out to solve this problem. Through the use of TensorFlow.js and an open-source deep learning model from the Model Asset eXchange, I feel like I’ve come up with a solution that works! In addition to showing off magicat🧙 😺, the tool I created, I want to talk a bit about the tools that made it possible.

I wish I could search through the contents of images like this… (Plot Twist: Now you can!)

Tensorflow.js — Helping Democratize Deep Learning

Until recently, Deep Learning development was a task performed almost exclusively by machine learning engineers, data scientists, and developers with experience in cognitive computing. Using traditional data science and ML tools like Python, numpy, TensorFlow and Keras, these pioneers helped establish pipelines and workflows that would lay the foundation for what was to come. While these tools do their job well, they come with a bit of a learning curve and can be cumbersome at times. Anyone who’s spent time juggling different Python versions and dependencies can attest to how frustrating that can be, and that’s just to get your environment set up. It was clear to me that there had to be a better way.

Before discovering the magic of TensorFlow.js, there were a few tools that really smoothed out the Python dev experience that deserve some recognition. First and foremost — Anaconda. If you’re doing any kind of data science or machine learning with Python, this one’s pretty much a must-have. Installing Anaconda on your machine gives you access to the conda package manger, which uses pre-built packages for HUGE speedups compared to pip, but more importantly conda allows you to manage different “environments” on one machine, which can be activated from the terminal to easily and quickly switch between Python versions or sets of dependencies.

While working with conda helped make my life easier, for me the perfect solution has arrived in the form of TensorFlow.js. Other developers have already made this discovery and have been singing the praises of TF.js for some time, like Maureen McElaney’s article that talks about the impact this has on offline-first web apps. This is a powerful side effect of hosting models on edge devices rather than in the cloud, and one that I was eager to take advantage of. These benefits allowed me to construct a deep learning powered command line utility that doesn’t rely on any remote service or internet connection.

Putting it All to Use: Building 'magicat' 🧙 😺

The most important part of any deep learning application is the model. This is the file (or collection of files) that contains all of the “knowledge” learned during the training process, and is what gets consulted when we need to use that knowledge to classify or identify some type of input. Traditionally this is done through RESTful API calls that are sent to some type of model-serving framework. In this type of an architecture there would also need to be a separate client running a GUI or some other type of interface that the user could interact with. With TensorFlow.js, now I can embed the model directly in my JavaScript code (in this case a command-line utility) which lets me cut the number of components needed for my application in half! Not only does this result in fewer moving parts to worry about, but it also means my app is no longer relying on any external dependencies.

Now I can see the contents of an image without leaving my command-line comfort zone.

Since I knew how I was going to interact with my model, I just had to find one that could identify objects in an image. Luckily, there’s “a place for developers to find and use free and open source deep learning models” called the Model Asset eXchange (MAX) that hosts over 20 different models built for a wide range of tasks. After choosing the Image Segmenter model from MAX, I was able to download and convert the model into a TensorFlow.js-ready format by following the steps outlined by va barbosa in this article and using a tool called tfjs-converter.

I was able to build my dream CLI utility by building on top of others’ open source work.

With my deep learning model ready-to-go after the conversion, all that was left for me to do was straightforward JavaScript development. I found working with the Tensorflow.js API to be extremely intuitive for a user of the Python version, and rarely required me to go back and ‘read the docs’. The challenges I ran into were minimal, and really helped to highlight what I love most about the JavaScript ecosystem — the developer community! With so many JS developers out there, doing all kinds of different things, there’s almost always help available when you need it. Working with image data outside the browser had me scratching my head at first, but thankfully the folks behind node-canvas had already come up with an excellent solution. Similarly, when I started thinking that it would be cool to show previews of the different objects in the image, a quick search produced the awesome terminal-image module by Sindre Sorhus which dropped right in to my code. These types of experiences are what drive home the magic of open source software for me. It’s like we’re all building and figuring things out together!

The Model Asset eXchange features open source deep learning models for many types of applications.

Check it Out

Find magicat 🧙 😺 on GitHub, or browse the Model Asset eXchange to see what models you can use in your own applications. The list of MAX models is updated periodically and includes models that accept audio samples as input, as well as adversarial cryptographic and generative models.

One more thing… if you enjoyed this post, visit The Data Lab @ CODAIT for more reading on topics like deep learning and open source software.