Analytics Vidhya
Published in

Analytics Vidhya

Photo by Anthony Shkraba from pexels

Statistical Graphs and Where to Make Them

“Graphic excellence Is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.” — Edward Tufte

Written March 2021

Factual data presenters and seekers alike are challenged as short-form contents now more preferred in mindless scrolling times. This is where comprehending graphs becomes magical. If utilized and understood appropriately, graphs can deliver convoluted stories with ease given that they systematically emphasize notable parts of the data. It is ultimately a timely skill to learn in a pandemic as we wrap our heads around the extent of the health crisis via overwhelming statistics sporadically thrown at our screens.

Furthermore, a study suggests that our brains usually process images 60,000 times quicker than a table or text. The same study found that their subjects retained 65% of visual information compared to only 10 to 20% from written or spoken form 3 days after exposure to data. Thus, if you have a message to convey in this century — or even just as a regular media consumer, familiarizing yourself with statistical graphs and when it is appropriate to use a specific type is a must. Hence, prepare your note-taking system as this read will take you to the logic of graphs, fundamentals of major graph formats, and data visualization tools suggestions for non-programmer students (apart from Excel and SPSS) with their best use and cons.

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” — H.G. Wells

HOW DO YOU KNOW IF YOU’RE CHOOSING THE RIGHT ONE?

With graphs, luckily there is such a thing as “the right one.” Before going to the nitty-gritty of graph formats, one must first understand the 2 kinds of data subdivided into measurement type because they will determine which graph is appropriate for the data being featured. They are the following:

I. QUANTITATIVE DATA

From the root word itself, these are data that elucidate results or trends using numeric values. It is broken down into the following:

a. Discrete — data comprising whole numbers (0, 1, 2, 3…) e.g. number of students in Communication Research department.

b. Continuous — data that can draw values in an interval, for instance, length such as the time it took to answer an exam (23–51 minutes).

Quantitative measurements are for data relationships such as seeing correlation, deviation, series over time, and distribution.

II. QUALITATIVE DATA

This, on the contrary, the data of quality is based on non-numeric characteristics that can be:

a. Categorical — data that merely classify subjects without following fixed order, e.g. gender.

b. Ordinal — From the word itself, the data adheres to a logical order or sequence e.g. socioeconomic status (high, middle, low).

Qualitative measurements are used for ranking, nominal comparisons, partial and total relationships (percentage of a specific category)

It is essential to start there as different data visualizations serve divergent purposes. Now, here goes 8 common formats that turn can turn your data into information, their kinds and advantages, and applicable data relationships:

STATISTICAL GRAPHS

1. BAR CHART

This graph is best for showing the highs and lows of categorical data congruent to the values that they represent in the y-axis. It is often confused with histogram (more on this type later) as they both use rectangular bars. However, besides bar graphs being designed for categorical and discrete data, another difference is that bar charts require space between categories around one-fourth of the bars’ width. It is best for making viewers see comparisons at a glance and for highlighting the magnitude (represented by the length of the rectangle) of data sets. It also has numerous types with its own strong suits.

TYPES: Simple, grouped, subdivided, 100% Subdivided, Horizontal, Net deviation, Connected Column Chart.

GROUPED BAR CHART — Best if there is a grouping variable beyond the categorical variable.

Source: Center for Disease Control and Prevention. (2020). Severe Outcomes Among Patients with Coronavirus Disease 2019 (COVID-19) — United States, February 12-March 16, 2020. https://www.cdc.gov/mmwr/volumes/69/wr/mm6912e2.htm

SUBDIVIDED OR COMPONENT BAR CHART — Good for showing the contribution of each kind within the total magnitude of a category

Source: https://www.internetgeography.net/divided-bar-charts-in-geography/

100% STACKED OR SUBDIVIDED BAR CHART — It is like the previous subdivided bar chart but shows the relative percentage of the categories that add up to 100%. It is like a pie chart (more on this later) but can also show how the proportion of data change over time:

Source: https://exceljet.net/chart/project-goal-attainment

HORIZONTAL BAR GRAPH — Similar to simple bar graphs but for cases where labels for categories are too long to surface on the x-axis.

Source: https://www.mekkographics.com/covid-19-cases-by-country/

** PICTOGRAPHS have the same concepts with bar graph but uses pictures, symbols, or icons instead of bars to best represent concepts or ideas:

Source: United Nations. (2020). Nations United: Urgent Solutions for Urgent Times. [Video File]. Access: https://www.youtube.com/watch?v=xVWHuJOmaEk&t=723s

2. PIE CHARTS

As you are familiar with, pie charts are composed of a circle split into categories with their size representing a portion of the total in terms of percentage distribution. Statisticians and data scientists maintain that pie charts CANNOT be subdivided into more than five to six data groups to avoid crowding and misleading visualization. It should be plotted according to the magnitude with the biggest proportion beginning at 12 o’clock.

EXPLODED PIE — For highlighting a specific sector.

A pie chart of greenhouse gas emissions from 2007 IPCC Report: 2007: Climate Change using Python. Access here: https://scipython.com/book/chapter-7-matplotlib/examples/a-pie-chart-of-greenhouse-gas-emissions/

DONUT CHART — This variation is used to display additional data in the center.

Source: Doing Data. How to Create a Donut Chart in Tableau. Access: https://www.doingdata.org/blog/how-to-create-a-donut-chart-in-tableau

PIE OF PIE — This variation expands a group of values through another pie attached to it to represent more categories without congesting the graph. This is your best bet if you have more than 6 categories.

Source: Formplus (2020). Pie Charts: Types, Question Examples + [Excel Guide]. Access: https://www.formpl.us/resources/graph-chart/pie/

PARETO DIAGRAM — This graph simply brings the best out of both bar and pie charts. It shows the frequency distribution through bars and also the percentage using a line proportional to the percent values specified on the right.

Pareto Diagram made in MS Excel by Lark Gabrielle Rogan

4. HISTOGRAM

Histogram’s main difference from a bar graph is its capability to present continuous variables. Also, the base of each bar is proportional to the width of the interval it represents in the reason that it does not only count the data within the labels indicated at the x-axis (e.g. “100,” “95,” “90”…) but also the entire cluster of data points under the surface of the bar (e.g. 104.99–100, 99.99–95, 94.99–90). Hence, it does not require spaces in between the bars and is not applicable for categorical variables. The histogram provides an overview of the sample or the population’s distribution based on the characteristic being studied.

Histogram made in DataWrapper by Marvin Talan.

5. LINE CHART

Through plotting a series of data points over time, line charts represent the change in quantities over time. They are especially beneficial for identifying relationships, acceleration, and volatility of data.

Access: https://www.nature.com/articles/d41586-020-01136-8

AREA CHART — An extension of and more powerful than line charts, area charts do not only represent the relationship of series over time but also illustrate and emphasize the data’s volume. That is done by making the area under the line filled in. It can be a regular, stacked, and 100% stacked area chart — like a bar chart. Their only difference is that bar charts are for the comparison of values and area charts are for showing how values develop over time.

Read more: https://blog.datawrapper.de/area-charts/#:~:text=Area%20Charts%20have%20axes%20with,%3B%20column%20charts%20don't.&text=Also%2C%20readers%20will%20recognize%20faster,shares%20instead%20of%20absolute%20values.

6. SCATTER PLOT or X-Y graph

Scatter plots graph a pair of numerical data using dots with each variable on 2 axes. This is best for illustrating relationships or the lack thereof. One best advantage of scatter plots is they can point out the outliers or the data points that go against the logic of the whole data set.

Our World in Data. (2020). The scale of testing compared to the scale of the outbreak. Access: https://ourworldindata.org/coronavirus/country/philippines

7. BUBBLE CHART

Bubble charts extend scatter plots’ abilities by accentuating data dispersed in the diagram as it allows the point size to vary according to the magnitude of data. The bubbles can simply be plotted or incorporated into a bubble map if applicable.

The Late Sir Hans Rosling enthusiastically presenting his Global Population Growth Bubble Chart. Source: TED (2010). Hans Rosling: Global population growth, box by box. [Video File]. Access here if you want data as therapy: https://www.youtube.com/watch?v=fTznEIZRkLg

8. CHOROPLETH MAP

Choropleths are well-known thematic maps for simple visualization on how the variable being studied across the chosen geographic area varies. It requires continuous data and uses the colors and shade to indicate magnitude, the darker the shade the higher the magnitude.

Source: World Bank. (2019). GDP per capita (current US$). Access: https://data.worldbank.org/indicator/NY.GDP.PCAP.CD?view=map

Other graphs include Box-and-whisker plots (best for showing the dispersion of data set and detecting skewness), Funnel Charts (practical for showing progress in each stage of a process), Stem and Leaf plot (almost similar to bar graph and is best for organizing discrete and continuous data), treemaps (applicable for grouping data in a hierarchical or tree-based manner), Gantt chart (handy for planning and scheduling projects), waterfall charts (to see sequentially the accumulated effect of positive and negative values on variables), heatmaps, time series, radar charts, and the list goes on!

WHERE TO MAKE THEM:

As a communication researcher, I have only used Excel and SPSS for most of my work. It, however, recently fascinated me when I discovered that there are hundreds of applications and scripts available that can handle large data sets and create interactive visualizations with them! Beginners in data visualization typically start with STATIC VISUALIZATIONS that are common in social media infographics or printed as handouts where users cannot go beyond the view it presents. Whereas INTERACTIVE VISUALIZATIONS requires application or modern data analysis software that permits users to explore specific data points and manipulate the graph according to the visualized story they prefer.

INTERACTIVE VISUALIZATION EXAMPLE:

SOURCE: John Hopkins Coronavirus Resource Center COVID-19 Global map dashboard. Access here: https://coronavirus.jhu.edu/map.html

Thus, if like me, you are looking for some at-home exploration during this pandemic, here are a few student-friendly (and non-programmer) Data Visualization Tools you can access and explore for FREE:

1. TABLEAU PUBLIC

Tableau Public is the free version of Tableau Desktop that is an appreciable option for anyone starting to grow into the fancy and interactive side of data viz. It can connect to Excel, Google sheets, SAS, SPSS, and R among others but is unable to connect with major databases like MySQL, Amazon Redshift, Google BigQuery, etc. Its opening interface initially offers (but not limited to) the creation of text tables, heatmaps, highlight tables, symbol maps, pie charts, treemaps, different bar chart kinds, circle views, different line graph charts, area charts, histograms, box-and-whisker plots, scatter plots, bullet graphs, Gantt, and packed bubbles that users can control according to color, size, text, and details of choosing.

BEST FOR Interactive data visualization practice

MAJOR CONS: You should NOT use Tableau Public if you are processing sensitive or personal information because as its name suggests, your work must be shared publicly with the community. Therefore, you cannot save locally which is handy when you want to access your work without an internet connection. Plus, your dashboard will be accessible by the internet.

DOWNLOAD HERE: https://public.tableau.com/en-us/s/

THEIR INTRODUCTORY TUTORIAL: https://www.youtube.com/watch?v=iT1iHLGawIM

PREVIEW OF ITS CLEAN AND EASY TO USE INTERFACE:

SAMPLE DATA VISUALIZATION:

Source: Adams, N. (2020). UK Hospital Youth Admissions for Mental Health Conditions. Tableau Public. Access here: https://public.tableau.com/en-us/gallery/uk-hospital-youth-admissions-mental-health-conditions?tab=viz-of-the-day&type=viz-of-the-day
Interactive visualization of highest-grossing actors of all time. Source: Chapman, C. Toptal. Access: https://www.toptal.com/designers/data-visualization/data-visualization-tools

2. CANVA

This website is literally the artsy college kid’s best friend. Canva is not only for your aesthetic Instagram stories or imaginative presentations but also for your graph-making needs with their premade templates — simple and no installation needed! Canva can even help you choose your graph to need if you are unsure what type best suits the data relationship you are going to present. The site has templates for balanced scorecard, simple bar graph, bubble map, comparison chart, concept map, cycle diagram, decision tree, donut chart, ecomap, fishbone diagram, flowchart, Gantt chart, line graph, mind map, etc.

BEST FOR Small/simple data, school/work reports, or if time-constrained but requires an artistic appeal.

MAJOR CONS: Limited graph type options and control for its elements. Additionally, it is mainly designed only for the presentation of minimal data as users cannot connect it to databases. Canva also cannot produce interactive visualizations.

ACCESS HERE: https://www.canva.com/graphs/

PREVIEW OF ITS SIMPLE AND HUSTLER-FRIENDLY INTERFACE:

SAMPLE DATA VISUALIZATION:

3. DATAWRAPPER

Ever wonder how The New York Times, The Guardian, Wired, Vox, or the Fortune magazine create their professional-looking data visualization? This free and no sign-up tool is what augments their compelling stories! Datawrapper was designed for directly embedding interactive charts, maps, and tables on news websites. Its hassle-free interface only requires importing of data and charts can be created upon clicking “proceed.” Their viz options include 19 chart types, 3 maps, and data tables (bar charts, line graphs, election donuts, area charts, choropleth map, scatter plots, locator maps, and suchlike) — all interactive and responsive!

BEST FOR The free plan is good for small news websites, for instance, school publications with no code or design skills needed!

MAJOR CONS: Limited data sources (the main method is copying and pasting data on the site).

ACCESS HERE: https://www.datawrapper.de/

WORKFLOW PREVIEW:

SAMPLE DATA VISUALIZATIONS:

There are other data visualization tools including Google Charts, Polymaps, D3.js, Chart.js, Sigmajs, and Chartist.js, but they require knowledge in coding and programming (some very basic). Nevertheless, it is not necessary to know all data visualization tools and must consider what works best, convenient and which offers necessary properties for one’s specific type of work.

I hope you have a nice appreciation and understanding of data visualization!

REFERENCES:

[1] Chapman, C. (2018). A Complete Overview of the Best Data Visualization Tools. Retrieved from: https://www.toptal.com/designers/data-visualization/data-visualization-tools

[2] Matias, M. (2020). Visualize It! A Comprehensive Guide to Data Visualization. Netquest.

[3] Elizon, K. (2019). Type of statistical charts [Lecture]. Retrieved from Polytechnic University of the Philippines Statistics Applied in Communication Research.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store