All You Need to Know About Bar Graphs

Basics, tips, and pitfalls to avoid when it comes to this omnipresent chart type

Darío Weitz
Nightingale
7 min readMar 19, 2020

--

Image from Pixabay

AKA: Bar Charts, Column Charts, Column Graphs, Horizontal Bar Charts, Vertical Bar Charts.

Why: bar graphs (BG) are used to make comparisons between different items or categories. Standard BGs (unlike stacked, clustered or overlapping bar charts) compare only one numerical variable per item or category. They are also useful for showing trends over time. Another alternative is to use them to focus on a particular data value and compare it against the other elements that make up the data set. They should not be used for Relationships, Composition or Distribution analysis. Bar Graphs are also used for ranking: the following chart shows that when it came to Data Science methods, logistic regression, neural networks, and random forests were the most used by a group of Data Scientists (Dutta, D., 2020).

Source: Diksha Dutta (#1)

When it comes to categories, remember that they refer to qualitative elements such as methods, surnames, cities, companies, brands, dates, years, etc. A set of variables are called categorical if their data or observations can be assigned in categories that do not overlap. Categories can be nominal or ordinal: methods, companies and brands are nominals; otherwise, ordinal categorical variables allow to rank the categories [low, medium, medium high, high] or [sparse, sufficient, abundant, excessive]. On the other hand, a trend is the flux of the measured numerical values.

With reference to the nature of the message, a BG is very useful for taking a data set and make comparisons. It tries to answer the question: “How many are in each category”? A bar chart also shows information about time series when the main objective is the examination and comparison of individual values. It is possible that these individual values ​​are slightly connected, not constituting a clear trend or pattern.

How: bar charts are two-dimensional with two axes: one axis shows categories, the other axis shows numerical values. The axis where the categories are indicated does not have a scale to highlight that it refers to discrete (mutually exclusive) groups. The axis with numerical values ​​must have a scale with the corresponding measurements units. The quantity of each category is shown by the length or height of horizontal or vertical rectangular bars. The length or height of each bar is proportional to the numerical value that is intended to be displayed. Each bar represents a single category and some space is usually left between them.

Schematic Diagram:

To illustrate the point of using bar charts for comparison and ranking, take a look at a recent survey of nearly 24,000 data professionals made by Kaggle. It clearly revealed that Python, SQL, and R are the most widely used programming languages. The most popular was Python (83% used). Additionally, 3 out of 4 data professionals recommended that aspiring data scientists learn Python first. Percent totals more than 100% because, on average, data professionals used 3 (median) languages in 2018. The following BG highlights the preeminence of Python amongst other leading programming languages (Hayes, B., 2019).

Source: Hayes, B. (#2)

Storytelling: standard bar charts are used to make numerical comparisons amongst categories. We compare numerical values by comparing lengths or heights. We can see at a glance highs and lows between many different elements. And we can see them very easily because rectangles are “heavy” visual markers. Also, you can follow trend-based visualizations of data over a period of time through the height of rectangles.

On the other hand, BGs are not suitable for relationships or distribution analysis. We only make composition analysis by means of Stacked Graph Bars. They are not the best tool to show the rate of change in trends or tendencies.

Bar charts are based on the natural ability of human beings to evaluate distances. We can easily determine which of the elements ahead of us is longer, which is the shortest and how the other elements are ordered. Strictly speaking, in a BG what we do is to replace that numerical image with a vertical or a horizontal rectangle. What really matters is the final value of the rectangle that encodes the data to be displayed. A no less important aspect is that audiences are usually very familiar with bar charts so that they can focus on the message without wasting time in the study of the diagram.

Tips for bar graphs

BGs are based on length to represent the dataset. Short bars mean smaller amounts, long bars mean larger quantities;

Start the vertical axis at 0: if the bars are truncated, the actual value is not properly reflected. Remember that our sight is very sensitive to differences in length when trying to compare data. We inevitably distort the visual if we modify the base;

When it is essential to show details that are difficult to notice with the standard view, two strategies can be used: a) zoom in on the chart; b) “break” the chart. Take into account that you may lose the ratio between larger and smaller values;

Vertical orientation (column charts) is recommended when chronological data (time series) or negative numerical values ​​are present. On the other hand, it is preferable to use horizontal orientations (bar charts) when graphing numerous categories, in particular with very long labels;

Use colors judiciously. Hint: Four ways to give meaning through colors:

· by Category: show each category with a different color

· by Intensity: a single color graduated from least to greatest

· Divergent: two color schemes with a critical midpoint value

· Change of color: to stand out values ​​that are above or below a threshold, or to highlight a particular bar

Source: Yellowfin (#3)

Sort the categories for a better storytelling:

· alphabetically to facilitate the search

· ascending to follow the trend

· descending in order to improve readability

Use annotations inside the bars when it is essential to show exact numerical values ​​for each category;

Use grids when it is important to identify threshold values ​​that give sense to the story. But, do not abuse of them because bars are very dominant visual markers;

Even rectangles are very heavy visual markers, don’t hesitate to include in the graph any additional information that might improve the storytelling.

Warnings

All bars or columns must have the same width. Histograms and Variable Width Bar Graphs (Bar Mekko Charts) may have bars with different widths but they are used to convey a different kind of message. There are no spaces between the bars of a histogram or a Mekko chart;

If one of the variables is time (years, months, days, hours), always set it on the horizontal axis. Time always runs from left to right and never from top to bottom. Only use BGs when data points are at equal intervals in order not to distort the visualization;

Keep in mind that if there are ordinal categories, there is no possibility of changing their intrinsic ordering;

Avoid all 3D effects. Although they are aesthetically pleasing, they are against all the rules for an appropriate Data Visualization;

Also avoid using rounded edges instead of sharp rectangles; despite being more aesthetically pleasing, they do not facilitate comparison between lengths of bars, a key objective of a BG;

When you have to compare categories using more than one BG, the suggestion is to use vertical BGs to compare up to two charts and horizontal BGs for more than two charts;

Be very cautious with the use of double vertical axes. Even though they are properly labeled and the whole chart is well designed, the audience might be confused. Usually, audiences can’t rapidly process complex visualizations and understand the key trends and relationships that are shown;

Always keep in mind that up to 10% of the male audience might have color deficiency issues. Try not to use red and green bars in the same chart. Also remember that a red font on a blue background or vice versa creates a distracting effect of the 3D type since the eye has a delay in focusing both colors simultaneously;

Horizontal bar charts are preferred when there are a large number of categories or when the different categories have very long names. Avoid using tilted captions because they are very hard to read.

Source: Marisa Krystian (#4)

Column charts combines very well with line charts. When using them, keep in mind that bars are at the back of the chart whilst the dots and lines are at the front of the display.

Stacked Bar Graphs, Clustered Bar Graphs and Overlapping Bar Graphs allow to show more complex relationships than those usually done with standard bar graphs. They will be described in a following article.

Finally, be aware that Histograms, Population Pyramid, Pareto Charts and Radial Column Charts look like BGs but have a different message.

If you find this article of interest, please read my previous one: Scatter Plots, Why & How, Storytelling, Tips & Warnings. https://medium.com/analytics-vidhya/search?q=weitz

References:

#1: Diksha Dutta, (2020), “What’s happening with Data Scientist Jobs and Salaries in Europe?”, Dataconomy,

https://dataconomy.com/2020/02/whats-happening-with-data-science-jobs-and-salaries-in-europe/

#2: Hayes, B, (2019), https://www.forbes.com/sites/evamurray/2019/04/04/why-do-bar-charts-work/#283d064e43e8

#3: Yellowfin, “Data Visualization Best Practices Guide”, https://www.yellowfinbi.com/

#4: Krystian, M. (2018), “Do This, Not That: Bar Charts”, https://infogram.com/blog/do-this-not-that-bar-charts/

--

--

Darío Weitz
Nightingale

+3400K Views Engineer, Ms. Sc., Former Associate Professor at Ing. en Sist. de Inf., Fac. Reg. Rosario, Univ. Tecnol. Nacional, Argentina. Data Viz Consultant.