The best Data Science tool to start your journey with, in 2022?

Yash Gupta
Data Science Simplified
5 min readJun 12, 2022

Data Science related jobs are one of the hottest jobs to have in today’s generation. Data Science has been around for less than a decade but the size of data people use in the world today is almost immeasurable. Big Data and other amazing developments such as automated machine learning, machine learning ops or MLOps, etc., have made Data Science an ever-evolving field with lots to offer.

Learners today have a major question to answer in their big to become Data Scientists. The Data Science world has been growing rapidly, owing to the different tools and resources that anyone can use to make sure that Data is leveraged as the most important resource in their company.

Check out this article, where I discuss the best way to upskill in Data Science using the Open Source Master’s in Data Science as a guided path which is devised by Clare Corthell, one of the world’s most renowned Data Scientists today.

Where do I start

Now, coming to the question at hand. With so much to offer in a vast field like data science that branches out into so many different fields to get a specialization in and help the world get more out of their data, where should you start your journey, and with which tool?

The most important tools in Data Science (according to their popularity and my personal opinion) are;

  • Excel
  • Python or R
  • SQL
  • Tableau
  • Scikit-Learn

Ideally, a data scientist has to be someone proficient in Data Warehousing and Database management, Data Visualization, Data Manipulation, ETL (Extract, Transform, Load) processes, statistics, Machine Learning, etc.

That sure is a lot of things to be good at for one person.

In case you want to identify exactly where to start and where to be, at any point in your journey. It’s best to always refer to roadmaps and guided paths that are available online.

For starters, you can look for a roadmap like this one, which I particularly like a lot for the depth and simplicity it offers;

However difficult it may seem, they are all more or less equally important in everything that Data Scientists do in their work and projects.

The answer to the question is fairly simple if you know where you stand in your journey in Data Science. If you are someone who has never used Computers and would still want to start your journey, that’s alright too. We’ll get to when you should start with each of these soon.

Excel

Working with Excel is fairly simple and to the point, if you wish to work with data that’s mostly comprised of rows and columns. The data can be cleaned, filtered, manipulated, and changed to each element, using excel. The software is one of the first breakthroughs in working with data in the 21st century.

If you are someone who is new to working with data or if you have just started using data in your projects or your workplace, then Excel is where you start.

Find out more about it here: What is Excel?

Python or R

Python and R are two of the most used programming languages in Data Science. The Data Science world today cannot function without Python or R. The amount of work you can do with Python and R is unparalleled. Programming languages are inherently very flexible and can help people with ETL work, machine learning, process automation, and also data visualization.

If you are someone who has used Excel, Data viz tools, etc., and want to get to the next level, then Python or R is the way to go.

Find out more about it here: What is Python?

SQL

SQL is unarguably the most used database management language today and cannot is one of the most used tools in companies that work with multiple departments and store a lot of data. Even big data dependencies today like Hadoop and Spark use SQL to extract and work with data.

If you are someone who has been working with excel and understand how to work with data in the tabular form, SQL should be your choice to understand how to work with more data than Excel can store.

Almost all companies and databases in the world leverage DBMS using SQL making it a must-learn for anyone.

Find out more about it here: What is SQL?

Tableau

Tableau has been around for a while and is a tough competitor to PowerBI, Tableau can be used to do a lot of work using Data Visuals and tell data stories. This comes as a savior when you know that most of your stakeholders don’t understand the back-end work on the data and the formulas and how they work. If your stakeholders are only interested to understand what the data is telling them, then Tableau is an amazing way to do just that.

Tableau takes a dataset in a file like Excel, SQL database, JSON, etc., and using a simple drag and drop system, it delivers high-quality and accurate visualizations that can be used to show your stakeholders all that they need to know from a given dataset.

You can use Tableau if you are good with working with Excel and understand how to make data visuals and don’t wish to proceed with something as vast as a programming language.

Find out more about it here: What is Tableau?

Scikit Learn

Scikit Learn is one of the most amazing Machine Learning libraries offered by Python. Python is one of the most used machines learning programming languages and even in competitive coding people use Python extensively. The application of Machine Learning can be done to any problem statement that needs some predictive analytics.

Making a machine learning algorithm with merely 5 lines of code or beyond with highly specific settings is also possible by using Python’s Scikit learn library.

If you are someone who has been into Computer Science for a while and understands how coding works and how to work your way around the basics of python, start with Scikit learn! It's simple and can be used to make predictive models without a lot of hassle.

Find out more about it here: What is Scikit Learn?

While these are important, do check out these free tutorials that can help you get started in the best way possible on your journey to becoming a Data Scientist:

  1. Excel tutorial
  2. Python tutorial
  3. R tutorial
  4. SQL tutorial
  5. Tableau tutorial
  6. Scikit Learn tutorial

For more such articles, stay tuned with us as we chart out paths on understanding data and coding and demystify other concepts related to Data Science. Please leave a review down in the comments.

Check out my other articles at:

Do connect with me on LinkedIn if you want to discuss it further!

--

--

Yash Gupta
Data Science Simplified

Lead Analyst at Lognormal Analytics and self-taught Data Scientist! Connect with me at - https://www.linkedin.com/in/yash-gupta-dss