Data Visualization

Lets Plot :)

Dawar Rohan
Analytics Vidhya
Published in
4 min readMar 22, 2020

--

Plotting Numerical Variables.

Plotting the data helps us to understand the data quickly & helps us to see the patterns which are not visible in normal analysis.

Lets take a quick refresher to the seaborn library to make amazing plots.

In this section, we will how to :

  • Visualize univariate distributions.
  • Visualize bivariate distributions.

Seaborn is a python library which is build on top of matplotlib. Seaborn create much more attractive plots & often much more concise. Ready for the action?

Visualizing Univariate Distributions:

First & Foremost import the libraries & read the data,

Histogram

Histogram & density plots shows the frequency of variable along y-axis, the sns.distplot() function plots a density curve.

You can also plot what is know as rug plot which plots the actual data points as small vertical bars. The rugplot is simply specified as an argument of the distplot().

Simple density plot(without the histogram bars) can be created by specifying hist=Fals

Seabborn Uses matplotlib behind the scenes, so most of the functions still apply.

Next, lets look at the subplot, they work

BoxPlot

Boxplot are a great way to visualize univariate data because they represent the various percentile (25th, 50th etc) & the IRQ (inter quartile range).

Visualizing Bivariate Distributions:

Bivariate distributions can be called as two univarite distributions on x & y axes respectively . They help us observe the relationship between two variables.

They are also referred as jointplot. They are created by “sns.jointplot()”, let check the code.

Notice that both the distributions are heavily skewed and all the points seem to be concentrated in one region. That is because of some extreme values of Profits and Sales which matplotlib is trying to accomodate in the limited space of the plot.

Lets try some conditions to make the plot more clean.

We can also adjust the arguments to the jointplot() to make the plot more readable . For example specifying kind=Hex will create a ‘hexbin’ plot.

The bottom-right region of the plot represents orders where the Sales is high but the Profit is low, i.e. even when the store is getting high revenue, the orders are still making losses. These are the kind of orders a business would want to avoid.

Plotting Pairwise Relationships

Some time its really helpful to plot pairwise relationship between multiple numeric variable. To illustrate here lets take the prices of some cryptocurrencies such as bitcoin, ethereum, monero, neo, quantum and ripple.

Now, crypto enthusiasts would know that the prices of these currencies vary with each other. If bitcoin goes up, the others will likely follow suit, etc.

Now, say you want to trade in some currencies. Given a set of cryptocurrencies, how will you decide when and which one to buy/sell? It will be helpful to analyse past data and identify some trends in these currencies.

Merging the files:

Now, since we have lots of numeric variables, lets see the same as Pairwise scatter plots

We can also check the co-relation between the currencies

The above dataframe is called a correlation matrix, which helps us to identify the correlation between various variables, for instance quantum & ethereum are highly co-related (.791).

Heatmaps

Heatmaps are a great way to visualize the correlation matrix.

As learning is a continuous curve, we must never stop!

See you in the next one!

--

--

Dawar Rohan
Analytics Vidhya

Data Scientist | Machine Learning Enthusiast| Learning through Sharing |Learning is fun! Connect/Follow with me on LinkedIN, www.linkedin.com/in/rohandawar