Thinking Fast
Published in

Thinking Fast

I’m Not Dumb, But Data Science Can Make Me Feel That Way

How to Feel Better About Your Place in the Space

Photo by Austin Chan on Unsplash

You know what gets to me sometimes? Smart people.

Smart people can really get to me, especially in data science. Despite being in this business for 15 years, data science can still make me feel dumb.

Here’s how it went down for me this week.

I was in the process of building and testing a document classification model that had been trained on a few hundred thousand training examples. I even augmented the training data by taking pristine examples of some of the documents and wrote a function to randomly remove 30 — 40% of the words. I then ran this code thousands of times on each document to generate more training data.

I used Spacy’s out-of-the-box document classification architecture (a basic CNN architecture) as the basis of our model.

After updating the model with the new training data, we exposed it to data it hadn’t seen. The performance? Spoiler: it was not good on some key documents.

Even more alarming was the fact that the probabilities were higher than I would have expected for the docs it was classifying wrong. Overfitting? Possibly…*clears throat*…very likely.

After some ruminating, all this work got me thinking about more modern model architectures. Something more sophisticated than Spacy. So I started doing some research and was flooded with Medium articles and research papers and popular tech articles going on and on about different embeddings (sentence, word, etc), LSTMs, transformers, and every combination of these.

As I gazed over the hundreds of links whose titles I focused on, my eyes soon glazed over as well. I began to feel like I had dropped the ball by not working harder to stay up on these newer movements in the NLP field. My confidence dropped and I had kind of a crappy week.

But hey, wait a second. I have been in this field for long enough to know that despite these fancy titles, like “Hierarchical Transformers for Long Document Classification” I have been successful in this field because I have been able to use my understanding of how these models generally work to help focus my efforts.

After a bit of self-affirmation, I picked my ego up off the floor and began to identify a strategy for testing some new architectures, which also meant taking the time to learn some new architectures.

At the end of the day, data science is a rapidly changing industry that is heavily influenced by the experiments of researchers at both universities and shared through open-source by large companies. It is important not to get overwhelmed when facing new challenges in this field. Here are just a few points to keep in mind, if you ever get overwhelmed trying to learn data science and all of its latest and greatest installments:

1. Most new models coming out of research are some variation on neural net architecture, so take the time to slowly learn how neural nets work.

2. Remember what you are good at and don’t be shy to remind yourself of it. Self-affirmation is a ridiculously simple but powerful way to boost confidence and motivate us to at least try new things.

3. Not everything you read on the internet is as brilliant as it sounds. Trust me. I have worked through hundreds of smart sounding tutorials. Many of them don’t work because they are poorly explained, wrong, or missing some key piece of code.

4. Plan to learn a lot about either NLP or Computer Vision. Having a general focus on one of these two areas can really help to set the stage for learning new algorithms. And both are highly in demand. And the skills learned generalize to other contexts that are often much simpler.

5. Remember to focus on learning concepts.

6. As you develop a conceptual understanding, build intuition by experimenting with small data sets to get things to work. Learn how data need to be formatted before they can be passed to different model architectures.

Finally, be brave and know that even the really smart sounding people don’t know everything. You don’t have too either. Always focus on what you know and use that as a springboard for learning new techniques.

Like engaging to learn about data science, career growth, life, or poor business decisions? Sign up for my newsletter here and get a link to my free ebook.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store