Thinking Fast
Published in

Thinking Fast

Dipping Your Toe in the Deep Learning Deep End

Simple Steps for Testing the Waters

Photo by Drew Dau on Unsplash

I remember when I was first learning data science. Trying to parse out what to learn was a significant challenge, especially trying to do it on a budget.

At the time deep learning wasn’t even a thing yet. It was more about understanding databases, SQL, and the myriad of stats models available from toolkits like Scikit-Learn.

It wasn’t like neural nets weren’t invented or anything (I’m not that old, lol), they just hadn’t made it to the mainstream and researchers were still experimenting with different architectures. As a result, they didn’t even really hit my radar.

In fact, even when I got my first data science job it took me about 2–3 years to learn, apply, and demonstrate the value of Python as a data analytics language before I was able to start moving my data science teams in that direction. So deep learning was still a ways out.

Then around 2015 deep learning started to explode into industry as more and more technology firms were demonstrating significant value-add when applying things like convolutional neural nets to object detection in images.

I remember thinking at the time; “I wonder if I should start to wrap my head around deep learning?” I mean after all; I was still knee deep in natural language processing and all the basic functions used to clean the very messy world of NLP.

But it didn’t take long for me to begin to recognize the need to understand deep learning architectures as these models started to show powerful applications to NLP tasks as well.

After years of fighting it, insecurity as I looked at different architectures, and industry pressure I finally decided to jump into the deep end of deep learning.

I started with Keras and found this Youtube channel to be very useful for understanding both the architectures and applications of deep learning models using the Keras framework with a TensorFlow backend.

But as time went on, I saw more and more developers switching to PyTorch.

What’s the difference?

Keras is a high-level Python library for accessing TensorFlow on the backend and includes methods and functions that help to simplify building deep learning models by avoiding some of the baggage of TensorFlow operations.

PyTorch on the other hand is a Python-native library for building deep learning models that uses PyTorch’s own frameworks for building and training networks. It was originally developed by Facebook and allows for the creation of dynamic or online learning networks that are not as easily built with Keras.

PyTorch also comes with several additional “helper” libraries that improve processing speed and resource constraints.

Thus, although the learning curve is a bit steeper, PyTorch allows for more control and optimization than Keras does and so as of a few months ago, I decided to make PyTorch my new deep learning learning goal (say that 5 times fast, phew).

As you consider your own deep learning journey, consider the following tips to help ensure a successful experience:

Data Format

Formatting data properly is the number 1 most important thing to understand. For PyTorch, all model layer types (linear, CNN, RNN) require data to be formatted into tensors. Tensors can be easily created from numpy arrays. The key is to know how to format the numpy array so that it can be properly formatted into a tensor that will be accepted as input to your first layer.


Understanding how batching works is also important. Deep learning models are more computationally intensive than most other statistical model frameworks. Thus, batching helps to reduce the strain machines. Moreover, gradient decent performance is more effective with small sub-samples of data.

Data Dimensionality & Format

Related to number 1, is that we must understand dimensionality for different architectures. Linear layers expect 1-D tensors, which means it treats each row of a tensor as an independent observation even though we often package them as tensors with multiple rows. In the 1-D sense, each row is an observation in a batch. For CNN layers, we can pass 1-D, 2-D, 3-D, or 4-D tensors. The 1-D case is often applied to language problems where each row is a sentence or document. CNN’s with 2 through 4 dimensions are largely for image processing where 2-D images are black & white and higher order dimensions are color images. In each case our tensors are not simply 1-D, 2-D, 3-D and so on. Our tensors are actually highly dimensional because they include additional information that the layer expects such as the batch size and number of channels.

One thing I love about deep learning architectures, is that they allow us to test different creative combinations of layers. As one example, I have tested out the ability of a 2-D CNN using “images” I created from documents after they had been OCR’d. The idea was to test whether certain word locations in a document (e.g. titles at the top) could be learned by the model to improve document classification.

At the end of the day, I am still actively learning and experimenting with deep learning applications but just by focusing some energy and using some real data, I have been able to quickly advance my skills in this area. I encourage you to do the same!

Like engaging to learn about data science, career growth, life, or poor business decisions? Sign up for my newsletter here and get a link to my free ebook.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store