What is Histogram ?????

kritika Joshi
WiCDS
Published in
2 min readJan 24, 2021

A histogram is one of the most frequently used data visualization techniques in machine learning. It represents the distribution of a continuous variable over a given interval or period of time. Histograms plot the data by dividing it into intervals called ‘bins’.

https://www.math-only-math.com/problems-on-histogram.html

Also helps to find out that how many points are populated in Left Hand Side which shows counts and based on that the bar is created.

It shows numerical value by range of value.

Its function is to summarize distribution of univariate dataset.

Provides visual interpretation.

Summarizes discrete or continous data.

Useful for large dataset.

We can find MEAN, MODE, MEDIAN and STANDARD DEVIATION :

Suppose we have a set of numbers: 1, 23, 24, 25, 25, 25, 26, 27, 30, 32, 999

The mean value (112.45) is very sensitive to outliers. Almost all real-world data has outliers, so the mean value can be very misleading.

The median value (25) does not tell you anything about the distribution.

The full range (1–999) just shows the outliers.

The standard deviation (294.1436) can be hard to be interpreted without a statistical background.

The variance (86520.47) can be also hard to be interpreted without a statistical background.

Interquartile range (IQR) (24.5–28.5) is the central 50% of your values and does not tell you anything about the other 50%.

Histograms are column-charts, which each column represents a range of the values, and the height of a column corresponds to how many values are in that range.

The wider the range (bin width) you use, the fewer columns (bins) you will have.

Bin that are too wide can hide important details about distribution while bin that are too narrow can cause a lot of noise and hide important information about the distribution as well. The width of the bins should be equal, and you should only use round values like 1, 2, 5, 10, 20, 25, 50, 100, and so on to make it easier for the viewer to interpret the data.

TIPS

If you have a small amount of data, use wider bins to eliminate noise. If you have a lot of data, use narrower bins because the histogram will not be that noisy.

--

--