DATA STORIES | DATA VISUALIZATION | KNIME ANALYTICS PLATFORM

Recreating Minard’s greatest ever Chart with KNIME’s K-AI and Python

Reconstruct Minard’s iconic data visualization of Napoleon’s Russian campaign of 1812 with the help of K-AI, KNIME’s Python code generation engine

Dennis Ganzaroli
Low Code for Data Science
9 min readJan 22, 2024

--

Fig. 1: K-AI is creating Minard’s Chart (image by author)

Introduction

Charles Joseph Minard’s map of Napoleon’s lost campaign in Russia in 1812 is probably one of the best charts ever created and it is considered one of the most influential examples of data visualization in history.

Minard created this diagram of Napoleon’s Russian campaign in 1869, revolutionizing data visualization over 150 years ago. Unlike traditional charts that focus solely on one data dimension, Minard’s diagram masterfully combines three elements — geography, time and statistics — into a coherent and fascinating narrative.

His map is easy to read and interpret, even for people who are not familiar with data visualization.

Fig. 2: Minard’s orginal chart (image from Wikipedia).

The thick band shows the size of the army at each position. Starting from Kovno in Lithuania on the left side of the image, where the army still consisted of 422,000 soldiers, the path of Napoleon’s retreat from Moscow (on the right side of the image) in the bitterly cold winter is shown by the dark lower band, which at the end represents the 30,000 men who returned. The band is also connected to the temperature and time scales in the lower part of the picture.

Even though Minard’s chart is over 150 years old, creating it with common Viz tools such as Tableau or Power BI is not easy. And to create the chart with Python, you need to be very familiar with the Matplotlib library, among other things. But with the help of KNIME’s K-AI, we should still be successful. So let’s get started.

If you still don’t know KNIME, you can find here a good “Getting Started Guide” where you can also download the open-source software for free.

The visual programming language of KNIME is self-explanatory and therefore easy to learn.

Fig. 3: Getting Set Up with KNIME Analytics Platform (image from KNIME).

The KNIME workflows with all the examples can be found on my
KNIME Community Hub space.

We have already seen in my last article how powerful the new K-AI from KNIME is to create Python code with prompts:

We are therefore confident this time too, although the following challenge is not as easy as it may seem at first sight.

The dataset

Before we can start creating the graph, we first need to understand the dataset. It is available for various platforms. A list of the different formats can be found here.

Fig. 4: Minard’s dataset (image by author).

The data set is clearly not a simple standard CSV file. Rather, it has a special structure. As we can see, it’s in an uneven form. It essentially consists of 3 separate tables:

  • Column 1–3: longitude, latitude and names of cities
  • Column 4–8: longitude, temperature, number of days on the retreat march and dates (during the march home only)
  • Column 9–13: longitude, latitude, number of surviving troops, direction of travel (A=towards the attack/R=return journey) and division of army

The first thing to notice about Minard’s map is that it is a combination of three layers — the top part with the name of the cities, the middle part with the march details as lines, and the bottom half with the temperature information.

Therefore, we will first divide the data set into three parts, create the graphics and finally combine everything into one view.

Please note that you need to have a free account on the KNIME Hub for K-AI to work.

K-AI will create this for us with the following prompt:

“The data is from Minard’s chart. Divide the data in three parts: in a “Cities data” part, “Temperature” part and “Troops” part and drop the na’s.”

Fig. 5: Dividing the dataset in three parts with the help of K-AI in the “Python Script” node (image by author).

After a while we get the following code:

import knime.scripting.io as knio
import pandas as pd

# Load the data
df = knio.input_tables[0].to_pandas()

# Divide the data into three parts: "Cities data", "Temperature" and "Troops"
cities_data = df[["LONC", "LATC", "CITY"]].dropna()
temperature_data = df[["LONT", "TEMP", "DAYS", "MON", "DAY"]].dropna()
troops_data = df[["LONP", "LATP", "SURV", "DIR", "DIV"]].dropna()

# Output the tables
knio.output_tables[0] = knio.Table.from_pandas(cities_data)
knio.output_tables[1] = knio.Table.from_pandas(temperature_data)
knio.output_tables[2] = knio.Table.from_pandas(troops_data)

When we execute this node in KNIME, we get an error. We must first create three output ports so that our files created in the “Python Script” node can be output. After executing the node again, we get three output tables in the lower part.

Fig. 5: Adding three output ports (image by author).

Creating the Subplots

City Map

So let’s create the first plot of the cities using their latitude and longitude with the following prompt. This time we use the Python View node as we want a View instead of a Table output. We prompt K-AI with the following instructions:

“LONC is the x-axis value. LATC is the y-axis value. Draw me plot with the names of the cities in the column CITY.”

We get the following code:

import knime.scripting.io as knio
import matplotlib.pyplot as plt

# Load the data
df = knio.input_tables[0].to_pandas()

# Extract the required columns
lonc = df["LONC"]
latc = df["LATC"]
city = df["CITY"]

# Create the plot
fig, ax = plt.subplots()

# Scatter plot
ax.scatter(lonc, latc)

# Add city names as labels
for i, txt in enumerate(city):
ax.annotate(txt, (lonc[i], latc[i]))

# Assign the figure to the output_view variable
knio.output_view = knio.view(fig)

The Python View plots the cities according to their longitude and lattitude values. The generated Python code creates a scatterplot with this coordinates.

Fig. 6: Plot of the cities (image by author).

However, geographical coordinates are not the same as x and y values, as they have a different scale. For this example, we will be satisfied with an approximation.

Troops’ Route of March

Now we draw the troops’ route of march. The width of the line should describe the number of surviving soldiers.

Lets prompt:

“LONP is the x-axis value, LATP is the y-axis value and SURV is the number of survived. Draw me a chart that connects the point with a line. The width of the line is SURV.”

This is the code that follows:

import knime.scripting.io as knio
import matplotlib.pyplot as plt

# Load the data
df = knio.input_tables[0].to_pandas()

# Extract the required columns
lonp = df["LONP"]
latp = df["LATP"]
surv = df["SURV"]

# Create a new figure
fig, ax = plt.subplots()

# Plot the data
for i in range(len(lonp) - 1):
ax.plot([lonp[i], lonp[i+1]], [latp[i], latp[i+1]], linewidth=surv[i])

# Assign the figure to the output_view variable
knio.output_view = knio.view(fig)

and this is the plot that is generated. No line is visible. Only a purple area is displayed.

Fig. 7: Plot of the march of the troops? (image by author).

The reason is that the width of the line must be rescaled. We divide the width of the line by 10,000 and obtain the following chart with this adjustment of the Python code.

   ax.plot([lonp[i], lonp[i+1]], [latp[i], latp[i+1]], linewidth=surv[i]/10000)
Fig. 8: Plot of the march of the troops (image by author).

Now the chart looks better, but we would like to visualize the direction of the troops (advance and retreat) with different colors.

We tell K-AI to add the following prompt to the actual chart. (We use the color “tan” as this is the original color in Minard’s map for the advancing troops).

“Color the line with the column DIR. If DIR is equal to „A“ tan else black.”

import knime.scripting.io as knio
import matplotlib.pyplot as plt

# Load the data
df = knio.input_tables[0].to_pandas()

# Extract the required columns
lonp = df["LONP"]
latp = df["LATP"]
surv = df["SURV"]
dir = df["DIR"]

# Create a new figure
fig, ax = plt.subplots()


# Plot the data
for i in range(len(lonp) - 1):
# Reduce the linewidth by dividing it by a factor, for example 2
# Color the line based on the value of DIR
color = 'tan' if dir[i] == 'A' else 'black'
ax.plot([lonp[i], lonp[i+1]], [latp[i], latp[i+1]], linewidth=surv[i]/10000, color=color)

# Assign the figure to the output_view variable
knio.output_view = knio.view(fig)

Now it looks nearly like the original.

Fig. 9: March of the troops (tan=advance, black=retreat) (image by author).

Temperature Chart

It’s time to create the subplot of the temperature.

“Draw me a dashed black line chart. LONT is on the x-axis, TEMP is on the y-axis. TEMP and DAY are integer values.
Label the points with TEMP + MON + DAY. Add the degree symbol behind TEMP in the label.”

We get the following code…

import knime.scripting.io as knio
import numpy as np
import matplotlib.pyplot as plt

# Load the data
df = knio.input_tables[0].to_pandas()

# Convert TEMP and DAY to integer
df['TEMP'] = df['TEMP'].astype(int)
df['DAY'] = df['DAY'].astype(int)

# Create a new column for the labels
df['label'] = df['TEMP'].astype(str) + '° ' + df['MON'] + ' ' + df['DAY'].astype(str) # Added degree symbol

# Plot the dashed black line chart
fig, ax = plt.subplots()
ax.plot(df['LONT'], df['TEMP'], linestyle='dashed', color='black') # Added color='black'

# Label the points
for i, txt in enumerate(df['label']):
ax.annotate(txt, (df['LONT'].iat[i], df['TEMP'].iat[i]))

# Assign the figure to the output_view variable
knio.output_view = knio.view(fig)

and the following chart:

Fig. 10: Temperature in degrees of Réaumur (image by author).

There are on the original Minard’s chart three different annotations for the temperature: degrees in Réaumur, in Celsius and in Fahrenheit.

The chart shows the temperature in degrees in Réaumur. So -30 degrees is -38 degrees in Celsius or -36 degrees in Fahrenheit.

This is what our KNIME workflow currently looks like. We have loaded the Minard datafile with an Excel Reader node, split the file into three tables with a Python Script node and created the three different layers of the diagram.

Fig 11.: KNIME Workflow with Python nodes (image by author)

Now lets put all together.

Putting all together

We now combine all prompts in a Python View node and proceed step by step by executing in the K-AI mask the following prompts one by one:

- “The data is from Minard’s chart. Divide the data in three parts: in a “Cities data” part, “Temperature” part and “Troops” part and drop the na’s.”

- “Add a Chart:
LONC is the x-axis value. LATC is the y-axis value. Draw me plot with the names of the cities in the column CITY.”

- “Add another Layer to the chart:
LONP is the x-axis value, LATP is the y-axis value and SURV is the number of survived. Draw me a chart that connects the point with a line. The width of the line is SURV. Color the line with the column DIR. If DIR is equal „A“ tan else black.”

- “Add a Subplot half the height of the chart above:
Draw me a dashed black line chart. LONT is on the same x-axis like LONT, TEMP is on the y-axis. TEMP and DAY are integer values. Label the points with TEMP + MON + DAY. Add the degree symbol behind TEMP in the label.”

- “Adjust the x-axis so that LONC, LONP and LONT are on the same scale.”

- “Add in the temperature plot a horizontal grid”

- “Add the chart Title: “Napolean’s Russian Campaign” with fontsize 30 and the subtitle “Figurative map of successive losses in men of the French army in the Russian campaign 1812” with fontsize 10"

By adapting manually a little bit the code I got at the end the following chart:

Fig. 12: The final chart done with K-AI and Python (image by author).

Conclusion

Not bad! I was able to create the chart by just prompting my requests.
But prompting is not really simple rocket science. You have to interact and try a bit to achieve the desired goal. But it’s definitely a fun way to get your work done. Try it out!

Material for this project:

Thanks for reading and may the Data Force be with you!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn or Twitter and follow my Facebook Group “Data Science with Yodime”.

--

--

Dennis Ganzaroli
Low Code for Data Science

Data Scientist with over 20 years of experience. Degree in Psychology and Computer Science. KNIME COTM 2021 and Winner of KNIME Best blog post 2020.