A practical guide on how to get started in Data Science

Marius Vennemann
TechLabs
Published in
5 min readApr 29, 2018

Disclaimer: this is not a guide that will give you content recommendations. Instead, it focuses on the tool and language selection process to immediately get started and see results for your efforts.

At TechLabs it is our mission to support a new generation of young people in gaining state-of-the-art tech skills in one of our three tracks: Data Science, Web Development, and AI.

As we very successfully recruited our first class this month, we are very happy to see an exceptionally diverse field of recruits.

As many of them have no or just very limited coding experience, but an extremely high motivation to get started, we decided to aggregate a short and practical guide on how to get going in these areas.

So without further ado, let’s dive right in.

The Toolset in Data Science

“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.” — Abraham Lincoln

Below is a short summary of answers and practical examples to the most frequently asked questions by newcomers in the field of Data Science. We explicitly focused on making this list as dense as possible so you can immediately get started.

Which language should I use?

It depends. This article gives you a great overview of the three languages SAS, R, and Python.

TLDR: Researchers and statisticians tend to prefer R, while Python is a little easier to learn and is more popular among freelancers and in start-ups.

What program do I need for writing code?

While there are many suitable programs out there, we definitely recommend using Jupyter Notebook for Python and rstudio for R. Both directly come with Anaconda, a powerful collaboration and package management for open source and private projects, which can be downloaded here.

  • A video guide on how to set-up and use Jupyter Notebook can be found here.
  • Here you learn how to download and set-up Anaconda and run your first Notebook.

Where do I get datasets to work with?

The best source for free datasets is probably Kaggle.com. You can use the search box to find open datasets on everything from government, health, and science to popular games and dating trends. This Wikipedia article also gives a great overview of different available datasets and their sources.

Where can I get advice when I am stuck with a problem?

What do you do, when you do not have the immediate solution to your problem? Right, you google it. Whenever you search for a problem related to Python / R / any other programming language chances are very high that the top result will come from Stack Overflow.

Why is that the case? Well, Stack Overflow so valuable because of is its content. Almost everything you can think of is already there. The provided information there is the result of user-generated content.

What is a good tool to collaborate with others?

This is Github, hands down. GitHub is the largest online storage space of collaborative works that exists in the world. It also serves as the showroom for your projects: it’s basically the tech equivalent of a CV (even though it does not fully replace it). Here you can find a good starting point for working with Git.

What program should I use to write HTML / CSS code to visualize my results?

When you want to dive into the Web Development world and showcase your results or you just simply want to design and code a website for yourself, you will need an editor. Again, there a lot of different choices.

At TechLabs we tend to prefer the following editors: Sublime Text, Brackets, Atom or VS Code. All are sophisticated text editors for code and work like a charm. Feel free to test them out and use the one you like the most.

How to learn and stay motivated throughout the process

If you are also interested in learning more effectively, we might also have some gems for you.

  • Try to code every.single.day. Make coding a habit. Practice makes perfect and this is especially true for data science and coding.
  • Schedule a specific time for learning, ideally when you are able to get into a deep working mode.
  • Meeting and learning from people who have the skills you want to acquire is hugely beneficial — this is why we are such big advocates of our TechLabs community! If you are in a city in which we currently do not operate, feel free to use a service like Meetup.
  • Work on coding challenges. There a lot of different sites out there that provide a lot of different coding challenges. Even for beginners. Check out this medium post for a great overview of sites. These challenges are also the go-to preparation material for experts looking to land a gig at the likes of Google, Facebook, and Amazon.

Luckily, we saved the best for last. Here is (in our view) the most important advice for learning data science and coding effectively:

Build something. Find a practical problem that you, your family or your friends face and try to solve it by using code.

Wrapping up

One of the most encouraging things about data science and coding, in general, is the immediate feedback you get in the process.

Often, we see newcomers in this field getting frustrated, because they lack the proper toolset to begin with. In case you have ideas on what we can add to this toolbox so that others and our TechLabs class can profit from it, feel free to drop us a line.

I’m the Co-Founder and Vice-Chairman of TechLabs and passionate about Tech and self-improvement. I blog about it here on Medium. If we can mutually benefit from each other, feel free to connect via Linkedin.

If you enjoyed this article please consider giving it a clap below so others can see the content, as well. If you want to stay up-to-date on the pieces I publish here, feel free to follow me.

--

--