Becoming a Data Scientist in 2021

Including a powerful guide for serious beginners

Jason Dsouza
The Startup
6 min readFeb 26, 2021

--

Data Science is a field that has a lot of mixed emotions — massive hype around recent innovations, and plenty of resources telling you that in order to be a Data Scientist, you need to have an advanced math degree under your belt.

I disagree.

Now don’t get me wrong. Data Science is hard — there are a lot of things you’ll need to know about before you can call yourself a “Data Scientist”.

But take it one step at a time. There are slight workarounds to this rule, and this article will present you with a powerful learning guide if you are serious about becoming a Data Scientist.

Start with the Mathematics

Yes, you read that right. 99.9% of articles I come across instruct you to start with programming first. Bad move!

Data Science is a math-laden field, and in order to understand the many constructs in the field, you’ll need to have some kind of familiarity with math. Now, you don’t need a math degree for this, but you can use some of the following resources to (at least) read up on.

This Github repo includes a free roadmap to learn all the math behind AI:

3 topics that are absolutely essential are Linear Algebra, Calculus, and Statistics. For the most part, you can get away with just Statistics, but either way, it’s good to know about the concepts that drive the field:

Linear Algebra

This branch of math is used (almost) everywhere in Data Science. Your computer uses a lot of Linear Algebra in a majority of its calculations. The processing and representation of deep neural networks use Linear Algebra. Quite frankly, you’re missing out on a lot if you don’t have at least a basic understanding of the concept.

Calculus

Like Linear Algebra, Calculus too plays a large role in Data Science. But you don’t need to be a guru. All you need is a basic understanding of the core principles that affect your models.

Statistics & Probability

This topic will probably take up a significant chunk of your time. Good news: these concepts aren’t difficult, so there’s no reason why you shouldn’t master it.

Miscellaneous Topics

There are other topics that are useful, such as Graph Theory and Discrete Mathematics. You won’t be using them daily as a beginner, but expect to encounter them as you progress up the experience chain.

Still, if you’d like to have a quick read through:

If you are terrified at the mere mention of “math”, you’re probably not going to have much fun as a Data Scientist. However, if you’re willing to invest time to improve your familiarity with the principles underlying calculus, linear algebra, stats, and probability, nothing — not even math — should get in the way of you becoming a Data Scientist.

PS: Math really is fun. As you go deeper into math, be sure to understand the beauty of a certain math concept and how it affects something. You’ll soon share the unbridled passion that many mathematicians and Data Scientists share!

Programming

Now to the more exciting part: programming. With more than 2.5 exabytes of data being generated every day, it would be absurd not to use computers to analyze/find meaningful representations from that data.

“How much programming is required in data science, particularly statistical analysis and machine learning?”

A lot. In practice, most data science jobs will require you to code, and also because most companies require some data cleaning, implementation and productization, and adaptation of algorithms to their own specific purposes. If you can’t implement your own solutions into something product-ready, then you are a much less useful employee. (Source)

Python & R

Python is, by far, the most widely used programming language when it comes to Data Scientist. Almost four out of five developers say that Python is their main language in JetBrains’ 2016 survey.

Source

While Python may suffice for a majority of your tasks, you’ll need to have R on your toolkit as well to consider yourself a “well-rounded” Data Scientist. I recommend you focus on Python, and spend a little time on R as well.

Computer Vision

Computer Vision today is at the forefront of many exciting developments in the fields of autonomous vehicles, medical image analysis and so much more. This field is responsible for deriving useful insights into, primarily, image data (although videos are used to some extent).

While many Computer Vision libraries like OpenCV and Torchvision exist today, I highly recommend you use Caer, my Computer Vision library — it’s fast, lightweight, and extremely beginner-friendly.

Machine Learning & Deep Learning

Today, Machine Learning and Deep Learning algorithms are at the core of Data Science. For most job openings, this is where the demand lies.

Gaining Practical Experience

If you follow the resources I’ve mentioned above, you’re just halfway there. Now, you need to put your skills into practice!

To truly master the concepts you’ve learned, invest your time in working on problems that closely resemble a real-world application. Working through erroneous data, and picking yourself up when you fail will help you gain a deep level of expertise in Data Science.

Platforms like Kaggle offer a good starting point. Their machine learning competitions serve as a channel for brainstorming and it’s a great way to apply your newly-gained skills.

Here is a list of 10 Data Science Competitions for you to hone your skills:

Contributing to Open-Source

An extremely good option to consider is to contribute to open-source projects, such as Caer. Many people find it useful to gain “practical experience” while communicating on a pull request with many like-minded individuals.

An open-source project like Caer is a good place to start.

Closing Notes

This article presented a powerful learning guide for those serious about getting into Data Science in 2021.

I do want to point out that this guide won’t work for you if you’re not willing to put in the work. It will help make your journey smoother and you’ll well be on your way to pursuing a career in Data Science!

Remember,

Keep the big picture in mind!

Happy Learning!

--

--

Jason Dsouza
The Startup

I write libraries and sometimes blog about them | Top Writer | Creator of Caer, the Vision library for Python