Crash Course in Data Quiz — Journey Through the Data Visualization Zoo: Test Your Knowledge of Statistical and Hierarchical Data Visualization

Cibaca Khandelwal
AI Skunks
Published in
6 min readMar 27, 2023

This section contains 10 quiz questions on A Tour of Visualization zoo Introduction for Statistical and hierarchical Data

Q.1. In which year was the term “Visualization Zoo” introduced in the published paper by Prof. Jeffrey Heer?

  • A. 2000
  • B. 2005
  • C. 2010
  • D. 2015

Ans — C. 2010

Q.2. Can we use QQ plot for distributions other than Normal Distributions?

Ans.

Q-Q plots can be used to compare any probability distribution to a reference distribution, including beta distributions.

To create a Q-Q plot for a beta distribution, you would first need to generate a random sample from the beta distribution that you want to compare to the reference distribution. Then, you would plot the quantiles of the sample against the quantiles of the reference distribution on a scatter plot.

The Q-Q plot will show how well the beta distribution fits the reference distribution. If the data points on the Q-Q plot follow a straight line, it suggests that the beta distribution is a good fit for the reference distribution. If the data points deviate from the straight line, it indicates that the beta distribution is not a good fit.

It’s worth noting that beta distributions can take on different shapes depending on the values of the parameters alpha and beta. Therefore, it may be useful to create multiple Q-Q plots with different parameter values to compare how well the beta distribution fits the reference distribution under different scenarios.

Q.3. List the use case of QQ Plot?

Ans.

  1. Testing for normality:
  2. Comparing distributions:
  3. Identifying outliers:
  4. Checking for linearity:
  5. Assessing model assumptions:

Q.4. Name the parameter to create a different axis in the Parallel Coordinates Plot?

Ans. — dimensions

fig = px.parallel_coordinates(df_iris, color=”species_id”, dimensions=[‘sepal_width’, ‘sepal_length’, ‘petal_width’, ‘petal_length’], color_continuous_scale=px.colors.diverging.Tealrose, color_continuous_midpoint=2)

Q.5. Explain Visualizations for hierarchies dataset

While some data is just a flat collection of numbers, most can be organized into natural hierarchies.

Consider spatial units such as counties, states and countries; corporate and government command structures; software packages and phylogenetic trees. Even for data without an obvious hierarchy, statistical methods (eg k-means clustering) can be used to organize the data empirically.

To take advantage of the hierarchical structure, there are special imaging techniques that allow rapid multidimensional inference: micro-observations of individual elements and macro-observations of large groups.

Image1
Image1

Q.5. Prepare an ordered stem and leaf plot for the data. Briefly comment on what the analysis shows.

The weights (to the nearest tenth of a kilogram) of 30 students were measured and recorded as follows:

59.2, 61.5, 62.3, 61.4, 60.9, 59.8, 60.5, 59.0, 61.1, 60.7, 61.6, 56.3, 61.9, 65.7, 60.4, 58.9, 59.0, 61.2, 62.1, 61.4, 58.4, 60.8, 60.2, 62.7, 60.0, 59.3, 61.9, 61.7, 58.4, 62.2

! pip install stemgraphicRequirement already satisfied: stemgraphic in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (0.9.1)
Requirement already satisfied: seaborn in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from stemgraphic) (0.11.2)
Requirement already satisfied: pandas in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from stemgraphic) (1.3.4)
Requirement already satisfied: docopt in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from stemgraphic) (0.6.2)
Requirement already satisfied: matplotlib in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from stemgraphic) (3.7.1)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (3.0.4)
Requirement already satisfied: pillow>=6.2.0 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (8.4.0)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (5.4.0)
Requirement already satisfied: fonttools>=4.22.0 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (4.25.0)
Requirement already satisfied: packaging>=20.0 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (21.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (1.3.1)
Requirement already satisfied: python-dateutil>=2.7 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (2.8.2)
Requirement already satisfied: numpy>=1.20 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (1.22.4)
Requirement already satisfied: contourpy>=1.0.1 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->stemgraphic) (0.10.0)
Requirement already satisfied: six in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from cycler>=0.10->matplotlib->stemgraphic) (1.16.0)
Requirement already satisfied: zipp>=3.1.0 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from importlib-resources>=3.2.0->matplotlib->stemgraphic) (3.6.0)
Requirement already satisfied: pytz>=2017.3 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from pandas->stemgraphic) (2021.3)
Requirement already satisfied: scipy>=1.0 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from seaborn->stemgraphic) (1.7.1)
# importing the module
import stemgraphic

data = [59.2, 61.5, 62.3, 61.4, 60.9, 59.8, 60.5, 59.0, 61.1, 60.7, 61.6, 56.3, 61.9, 65.7, 60.4, 58.9, 59.0, 61.2, 62.1, 61.4, 58.4, 60.8, 60.2, 62.7, 60.0, 59.3, 61.9, 61.7, 58.4, 62.2]

# calling stem_graphic with required parameters,
# data and scale
stemgraphic.stem_graphic(data, scale = 1)
(<Figure size 540x234 with 1 Axes>, <Axes: >)
png

We will use the scale of 1 to get the decimal points plotted

In this case, the stems will be the whole number values and the leaves will be the decimal values. The data range from 56.3 to 65.7, so the stems should start at 56 and finish at 65.

In this example, it was not necessary to split stems because the leaves are not crowded on too few stems; nor was it necessary to round the values, since the range of values is not large. This stem and leaf plot reveals that the group with the highest number of observations recorded is the 61.0 to 61.9 group.

Q.6.

Britney is a swimmer training for a competition. The number of 50-metre laps she swam each day for 30 days is as follows:

22, 21, 24, 19, 27, 28, 24, 25, 29, 28, 26, 31, 28, 27, 22, 39, 20, 10, 26, 24, 27, 28, 26, 28, 18, 32, 29, 25, 31, 27

Prepare an ordered stem and leaf plot.

for, Laps swum by Britney in 30 days

# importing the module
import stemgraphic

data = [22, 21, 24, 19, 27, 28, 24, 25, 29, 28, 26, 31, 28, 27, 22, 39, 20, 10, 26, 24, 27, 28, 26, 28, 18, 32, 29, 25, 31, 27]
# calling stem_graphic with required parameters,
# data and scale
stemgraphic.stem_graphic(data, scale = 10)
(<Figure size 540x108 with 1 Axes>, <Axes: >)
png

Q.7. Which method shows hierarchical data in a nested format?

  • A. Adajency Diagrams
  • B. Scatter Plot
  • C. Q-Q Plot
  • D. Parallel Coordinates

Ans — A. Adajency Diagrams

Q.8. which chart is used for displaying multiple variables

Ans. — Scatter plot Matrix

Q.9. which chart is used for plotting non numerical data

Ans. — Stem and Leaf Plot

Q.10. Explain Enclosure Diagrams and its use in machine learning with example

Ans.

An enclosed diagram is a type of diagram that uses enclosed shapes to represent different groups or classes within a dataset. The shapes are often color-coded or labeled to indicate the class they represent, and the diagram can help identify patterns or clusters of data points that belong to the same class.

In machine learning, enclosed diagrams can be useful for visualizing the output of clustering algorithms, such as k-means or hierarchical clustering. By grouping similar data points into clusters, these algorithms can help identify patterns and structure in large datasets. Enclosed diagrams can then be used to visualize the results of the clustering, making it easier to interpret the results and identify any outliers or anomalies in the data.

Overall, enclosed diagrams can be a valuable tool for visualizing complex datasets and identifying patterns or clusters of data points, especially in unsupervised learning tasks such as clustering.

! pip install gapminderRequirement already satisfied: gapminder in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (0.1)
Requirement already satisfied: pandas in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from gapminder) (1.3.4)
Requirement already satisfied: python-dateutil>=2.7.3 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from pandas->gapminder) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from pandas->gapminder) (2021.3)
Requirement already satisfied: numpy>=1.17.3 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from pandas->gapminder) (1.22.4)
Requirement already satisfied: six>=1.5 in /Users/cibaca/opt/anaconda3/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas->gapminder) (1.16.0)
# libraries
import matplotlib.pyplot as plt
import seaborn as sns
from gapminder import gapminder # data set

# data
data = gapminder.loc[gapminder.year == 2007]

# use the scatterplot function to build the bubble map
sns.scatterplot(data=data, x="gdpPercap", y="lifeExp", size="pop", legend=False, sizes=(20, 2000))

# show the graph
plt.show()
png

--

--

Cibaca Khandelwal
AI Skunks

Tech enthusiast at the nexus of Cloud ☁️, Software 💻, and Machine Learning 🤖, shaping innovation through code and algorithms.