Swaroop
4 min readJan 3, 2018

Statistics For Aspiring Data Scientists

Statistics is a fundamental skill to have for every aspiring Data Scientist. Whether you are planning to pursue Masters in Data analytics or a PhD in Data science, you need to know statistics. In-fact, according to the Internet,

Data scientists are better at statistics than a software engineer and better at software engineering that a statistician.

Learning Statistics is the first step to become a Data Scientist. I was planning on starting a Data science series but I started with this statistics series instead, for that very reason.

Statistics Data science word cloud

This series is going to be different from other such series/videos online, because here, we are going to focus more on practical aspects of statistics than just theory. Instead of just reading about what correlation coefficient is or what a Hypotheses test is, we'll actually calculate those values and understand them as you use them. Also, most of the courses online are going to be in R language. But a lot of people who want to become Data scientists have a programming background and are familiar with Python. So, all the coding examples in this series will be in Python. And In-fact this series of articles is aimed at programmers who already have some experience in Python (or another similar language) and are looking forward to get into Data science and Data analytics.

Statistics is all about making sense of Data. For this lesson, We’ll get started with some basic terms related to various types of data.

Types Of Data:

At a top level, there are two types of data, NUMERICAL (or QUANTITAIVE) and ORDINAL (or QUALITATIVE). Each of these can be identified with two more categories under them.

Data Types in Statistics

Let’s look at each of these Data types one by one.

Discrete Data:

Numerical Data that can have only a countable fixed set of values. In other words, something that you can count.

Example:

  • Number of people in a room — It can be 10 or 100 or a 1000 but is always an integer.

Continuous Data:

Numerical Data that can take any value in a given interval. In other words, something that you can measure.

Example:

  • Volume of a room — It can be 1000 m3 or 1001 m3 or even 1000.8 m3. So, a continuous variable.
  • Height of a Tower

Nominal Data:

Data with categories, that doesn’t have any inherent order.

Example:

  • Colors — White, Black, Blue are categories but there is not order in that. One is not greater or lesser or better or worse than another.

Ordinal data:

Data with categories that has some order inherently.

Example:

  • Rankings in a Game — First, Second, Third has an order inbuilt making one better than another value.

Since this is just the first article, I am going to leave you with that. See you in the next article.

For more programming articles, checkout Freblogg

If you want some Python programming tutorials, checkout Freblogg/Python

Some articles on automation:

Web Scraping For Beginners with Python My semi automated workflow for blogging Publish articles to Blogger automatically Publish articles to Medium automatically

This is the 12th article as part of my twitter challenge #30DaysOfBlogging. Eighteen more articles on various topics, including but not limited to, Java, Git, Vim, Software Development, Python, to come.

If you are interested in this, make sure to follow me on Twitter @durgaswaroop. While you’re at it, Go ahead and subscribe here on medium and my other blog as well.

If you are interested in contributing to any open source projects and haven’t found the right project or if you were unsure on how to begin, I would like to suggest my own project, Delorean which is a Distributed Version control system, built from scratch in scala. You can contribute not only in the form of code, but also with usage documentation and also by identifying any bugs in its functionality.

Thanks for reading. See you again in the next article.