What tools you need to become a Data Scientist?

--

There are lots of definition floating around about data scientist.In my Opinion Josh Wills defined that in a great way. Here it goes “Person who is better at statistics than any software engineer and better at software engineering than any statistician”.

The Venn Diagram given below would clear the picture:

Source: Google Images(By Drew Conway)

Drew Conway’s Venn Diagram is widely accepted across the data science industry. From the diagram, it is clear that a Data Scientist is an fusion of three different roles — Statistician, Substantive Expertise(Domain Knowledge), and hacking skill(Programming skills). The danger zone is the intersection of Domain expertise, and Programming without statistical backing, which can leads to wrong or disastrous insights and can results in wrong decisions and will cost humongous amount of money to businesses.

Recommended Tools to break into Data Science

I hope now you have an idea about the data scientist. Now,Let’s talk about the tools which you should master to become a Data Scientist. Starting with:

Microsoft Excel: You will be like Excel!…really? Absolutely, Excel is primordial in Data Science.In my experience, now Excel is a good tool for exploratory analysis and reporting. There’s a lot of business users who still like to receive a report in Excel.

SQL: It’s Becoming a Standard to Use SQL in Data Science
SQL proficiency is a basic requirement for many data science jobs, including data analyst, business intelligence developer, programmer analyst etc. Secondly,SQL Integrates with Scripting Languages.Maybe you want to pivot the data in a particular way and then create a nice data visualization.

Python/R: This debate is endless but i think at this point in time python has slight advantage over R if you are applying Artificial Intelligence Algorithms else for basic data analysis both are equally good. If you are just starting you can start with python and then if required you can learn R too.

Machine Learning: If you are just starting off I would recommend to at-least have a good sense of these 4 algorithms which are: Linear Regression,Logistic Regression, Decision Trees and Random Forest. By good sense i mean to know the math or logic behind the algorithm and what exactly going on in the back-end when you apply these algorithms.

Tableau/Alteryx/PowerBI: Data visualization is the representation of data or information in a graph, chart, or other visual format. This comes handy when you want to showcase your work to the business.

These are the tools which I feel are very crucial to get into this domain other than that Statistics is equally important. Descriptive statistics and Inferential Statistics is equally important to get correct insights.

Just to Sum up, tools which I recommend are as Follows: Microsoft Excel, SQL or any other Database tool, Python/R, Data Visualization Tool(Tableau/Alteryx/PowerBI) and Machine Learning. If you master these… Voila! You are a Data Scientist.

Happy Learning!

--

--

A Data Scientist by Profession and an actor by passion. Loves to read articles about data science,books and short stories in my free time.