Build a Data Science Knowledge Library

Raúl Vallejo
All The Data We Cannot See
5 min readFeb 13, 2019

--

Vasconcelos Library, Mexico City

Stack Overflow. A coders’ best friend. Impossibly unique coding problem? Probably not. Know how to look for it and you may find a jaw-dropping spot on answer to your problem. Stack Overflow is a prime example of the power of the online coding community.

However, that is just the tip of the iceberg. Being exposed to the data science community is step one. Step two involves harnessing all that information into your own personal superpower.

The benefits of this post are three-fold:

First, this will help you stay in the loop. Data science grows in a highly dynamic environment so it is important to stay on top of trends and to constantly seek inspiration.

Two, it will help you shake off that impostor syndrome. Once you expose yourself to real-world data scientists, you realize that most people are learning as they go.

Three, build a data science knowledge library. Steadily save valuable posts and articles and before you know it, you will have a pool of references you can go back to for any data-related situation.

Bear in mind that the content can easily get overwhelming. Still, it is very important to diversify your resource pool. Be mindful of the fact that not all content may be made for you. Find what you relate the most to and stick with that.

I highly recommend Pocket as your go-to tool to save, manage and read most of the content you’ll find.

If you can steadily organize the vast amount of content you will be scrolling through, save the valuable items and catalog them into different groups, you will end up with a knowledge library you can go back to whenever you please.

Do not listen to that little voice in your head saying “I’ll remember this”. Save the article. Don’t think twice about it.

Start off with this:

  1. Twitter is your new best friend
  2. Yes, open a Medium account
  3. Cheatsheets, cheatsheets everywhere
  4. Your first useful email suscriptions

Twitter is your new best friend

Set up a new twitter profile and follow these accounts to start with:

Twitter is good for real-time interaction with real-world data scientists: what they are working on, relevant questions they have and very very useful tips and links. Also, it is good for inspiration for data visualization and ML project ideas.

Yes, open a Medium account

Medium posts are extremely useful for many things data science related. They can be about gentle introductions to highly technical concepts or they can guide you into getting your hands dirty with some easy to follow code.

Get started by following these blogs:

There are also many useful articles out there just like this one. To better understand the different approaches there are to on-boarding data science, bookmark these posts:

  1. I Dropped Out of School to Create My Own Data Science Master’s — Here’s My Curriculum” by David Venturi
  2. Advice For New and Junior Data Scientists” by Robert Chang
  3. 45 Ways to Activate Your Data Science Career” by Kirill Eremenko

Do not forget to use the built-in highlighting feature on Medium. It will prove extremely useful when dealing with so many resources.

Cheatsheets, cheatsheets everywhere

Shout out to my boy Ale who made this meme. Spot on.

There are two kinds of cheatsheets:

Nice PDFs made by someone else

They’re useful references for when you’re faced with something new: different data manipulating problem, exploring visualizations, getting familiar with a package or for technical references.

Your own personal code snippets

This is one of the best tips I’ve had, particularly for learning a new coding language. At first, when faced with a new problem in a new language, it can be difficult to get the code flowing. Making a personal cheatsheet can help you feel “armed” against whatever gets thrown at you.

Create fresh R and Python notebooks with code snippets covering: data loading, cleaning, manipulating, visualizing and exporting.

It is algo a great way to boost your productivity for your routine coding. Especially for the early stages of a new project.

You don’t have to reinvent the wheel everytime you load a new dataset.

Your first useful email suscriptions

Newsletters can be good because they filter whats worth reading and what not. Social media feeds can be overwhelming sometimes, so its nice to have a nice compact email waiting for you every other friday morning.

KDnuggets: “Leading site on AI, Analytics, Big Data, Data Mining, Data Science and Machine Learning.”

Banana Data: “Your weekly AI and data science newsletter.”

There is a lot of content out there. Start investigating, researching, following and exposing yourself to the public work of other fellow data geeks. Arm yourself with knowledge and understanding of what the data science landscape looks like.

Save that article! Future you will thank you for it.

Bottom line: Get reading and save the good posts!

--

--

Raúl Vallejo
All The Data We Cannot See

Actuary, statistician and certified Data Scientist. Music & concert junkie.