TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

How I would Learn Python for Data Science if I Had to Start Over

Plus 3 tips to help you avoid making the same mistakes

Nicholas
TDS Archive
Published in
7 min readAug 16, 2021

--

I reached over to grab another sip of my red bull.

My eyes were bloodshot from staring at the screen for so long.

I was exhausted. But I needed to do this, I had to get my head around it.

So I flipped the page.

Then wrote a couple more lines in my notebook.

‎….And I did it again.

………………And again.

…………………..Annnd again.

I’d been sitting in a dark corner in Building 11 at the University of Technology, Sydney for 9 HOURS now.

My MacBook in front of me.

Python Crash Course on my right.

And about 87 different tabs open in Chrome

Desperately trying to get a decent grasp of Python.

I wanted to get it so bad….and I wanted to learn it fast (admittedly faster than what was probably practical).

I was studying my Masters and trying to bring some ML to my clients at work….without really knowing what the hell I was doing. There was just so much to learn when it came to Data Science and ML, let alone Python.

There had to be a better way.

Unfortunately, back then, I didn’t know any better.

Fast forward three years, there’s a lot of stuff I would’ve done a HELLLUVA lot differently. In that time, I’ve managed to build a ton of models, start (and crash) a startup, as well as becoming a Data Scientist at IBM working with some amazing clients.

But I always think back to those weekends I spent grinding away teaching myself to code. Looking back now, yeah it was worth it…but I could’ve done it wayyyyy more efficiently.

That’s exactly what I want to talk to you about today.

The strategy I would use to learn Python for Machine Learning and Data Science again…if I had to start over.

If you’re more of a video person, I also made a 5ish hour crash course that distills all this down but stick with me, I’m hoping i can share some golden nuggets nonetheless.

There are 3 key components that I think it’s important to get your head around before kicking off your study.

The first is CRUD.

1. Understand CRUD

Source: Nicholas Renotte

CRUD stands for Create, Read, Update and Delete. It’s a concept that’s commonly associated with SQL. It refers to the core actions that are necessary to work with database records.

However!

The same concept is useful when it comes to programming. If you can understand how to create, read, update and delete an object in Python, you’re well on your way to understanding that component. The only objection I have to this is that it’s also important to understand how to loop through components as well, so maybe CRUDL is a better initialism? I digress.

Understanding CRUD is important because it sets a foundation for the operations you should be able to apply to components in Python. Say for example you wanted to apply CRUD to lists.

We know that we can CREATE lists using sequences wrapped in square brackets.

# Creating a list
names = [‘neil armstrong’, ‘buzz aldrin’, ‘sally ride’, ‘yuri gagarin’, ‘elon musk’]

We can READ them using indexing or the print function

# Printing a list
print(names)
# Reading the first value
names[0]
# Slicing a range
names[1:3]

To UPDATE, you can reassign a value, use the insert method or append method.

# Update a single value
names[-1] = 'nicholas renotte'
# Update by adding to the end of the list
names.append('elon musk')
# Update by inserting at the start of the list
names.insert(0, 'richard branson')

And last but certainly not least, it helps understanding how to DELETE.

# Delete components from a list
del names[-2]

Understanding CRUD sets up a mental framework for the components you should understand for each data type.

2. Get used to using Jupyter

When I was starting out, there were a ton of IDE options when it came to coding in Python. But the one thing I’d wish I’d known is that Jupyter Notebooks are probably the best interface to use when starting out, especially when it comes to Data Science workloads.

Why?

They provide you with an interactive environment to build, explore and model your data. I’m going to go out on a limb here and say that nothing really comes close. There are other alternatives but Jupyter makes it ridiculously easy.

The easiest way I’ve found to have a stable operating environment for Jupyter is to use Anaconda.

Source: Anaconda

There are a range of flavors when it comes to Jupyter as well, they all provide a similar working interface with a few pros and cons. For a FREE environment that allows you to use GPUs and TPUs. Take a look at Google Colab:

Source: Google

But, be prepared to flex as well when you finally land that sweet sweet Data Science gig. A lot of enterprise organisations are moving towards Data Science platforms. The one I use at work on a day to day basis is Watson Studio.

Source: IBM

3. Start working on projects before you think you’re ready

It’s so easy to fall into the TUTORIAL TRAP!

Doing tutorial, after tutorial and never really getting up and running with building stuff.

I’ve been there. You’ve been there. We’ve ALL been there!

The best way to break through that rut is to start making and breaking stuff with Python. Find an easy enough tutorial that’s just outside the boundary of your skill and give it a go.

I’ve always wanted to do something in the accessibility space and decided to try and tackle sign language recognition pretty early on in my journey.

I don’t think I’m quite there yet. That being said tackling something that’s just outside the boundary of your skillset will help you accelerate faster than anything else!

Alright, enough of the ‘tips’. Let’s get to the nitty gritty.

What you should learn?

The code blocks below highlight the key components for each sub topic. They’re all explained in a ton of detail in the YouTube video as well and in this GitHub repo!

Variables

Think of variables as placeholders for values and data. They make it easy to refer to data or values that you might need repeatedly throughout your code.

Data Types

There are a number of different data types in Python. Understanding their properties helps you 1) set the right data type and 2) navigate its properties and 3) leverage the attached methods.

Conditions and Loops

Conditions are important, they allow you to run your data through gates and checks to determine if values meet certain conditions. Loops help you iterate through your sequences so you can perform certain actions repeatedly.

Math Operators

It’s useful to understand basic math operators, however note that a lot of the common packages that you’ll find yourself using e.g. Numpy and Pandas also have native mathematical operators.

Functions

Wrapping your code in a function allows you to make things modular, this means that you’re rewriting the same code less often. I found myself writing functions a ton, particularly when it comes to data preprocessing workflows.

Classes

To be perfectly honest, I haven’t had to use classes much in my day to day work. HOWEVER, I’ve found them being used just about everywhere when it comes to building custom neural network models and layers, particularly those which have multiple inputs and prediction heads.

Modules and Packages

Ah, modules and packages. These allow you to tap into the collective wisdom of Python developers around the world. Some of the most common packages you’ll find yourself using are Requests, Pandas, Numpy, Scikit-Learn, TensorFlow and NLTK.

Files and Error Handling

Writing out to disc or saving your data to files is especially useful particularly when working in the Natural Language Processing space. Error Handling comes in handy just about everywhere but especially so when you’re writing production-grade code.

Thanks for Reading

And that’s about it.

I’m crossing my fingers that you found this useful, and if you had any other tips on learning DS, ML or DL I’d love to hear them! The key thing to take away is to start, don’t be afraid to take that first step.

I’m posting a bunch of new ML and DS content each week on my YouTube channel, would love to connect with all of y’all.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Nicholas
Nicholas

Written by Nicholas

Data guy by day, YouTuber by night. Started playing with spreadsheets when I was 8 and I’ve been hooked on fancy plots ever since.

Responses (13)