Bin there, Done that !

..a peek under the hood of histograms

Balaji Sundararaman
Analytics Vidhya
3 min readDec 4, 2020

--

Photo by Paweł Czerwiński on Unsplash

For numeric features in a dataset, when we think of checking out its distribution, the first thing that we think of is a histogram. A histogram plots the number of observations for each range of the values of the numeric feature. These ranges are called as Bins. The shape of the distribution depends on the size (width or edges) of the bins.

In this blog we will look at:

  • How to access the bin edges of a histogram
  • How to set custom bin widths instead of the default

We will use the familiar iris dataset that comes built-in with seaborn.

Lets see how the histogram of one of the features sepal_length looks like:

Accessing the Bin Edges

Now, how can we know the exact bin edges used by seaborn? The values are stored in the patches attribute of the plot which also contains a list of the plot elements from which we can access the width of the bins as well as the height of the bars.

We can access all the bin edges using the below line of code

With Matplotlib histogram plots, it is much more straightforward to retrieve the bin edges and also the counts of observation in each bin. The plt.hist() functions returns the bin counts, bin edges and the patches.

Note that the bin widths can be slightly different with different functions or packages depending what binning strategy is used. For a detailed discussion on the various binning approaches, you can refer the below link:

If you are not interested in the accompanying plots and just want the numbers, well, look no further than numpy.

For some reason, if you are not happy with the default bin edges, then setting the bin size is easy. You can either specify the number of bins as an integer value in the bins parameter or pass a list of numbers for the bin edges.

Hope you liked this article. You may also like the ones below:

Thanks for reading. It would be great to hear your comments, feedback on bala@python4u.in. The code in this article can be accessed on https://github.com/bala-srm/histogram_bins/blob/main/hist_bins.ipynb

--

--

Balaji Sundararaman
Analytics Vidhya

Passionate about Data Analytics, Visualization and Machine Learning with extensive experience across functions in India’s emerging Fintech vertical