Live Blog: 18A — Visualisation: Innovation and practices in the visualisation of official statistics
This is a quick live blog from the session at the NTTS conference at Eurostat. I have tried to quote the presenters as much of possible, but due to the speed, I’ll probably have failed at some parts. Errors are mine, not the speakers! I’ll be refining this article during conference, if you spot errors and typos, please let me know.
Four interesting sessions: here we go!
The Compositional dot map: a visualisation of spatial data
presented by Martijn Tennekes
Below a well known dot map.
Below a compositional dot map. This was the inspiration for the work of Tennekes and his colleagues.
Now Tennekes’ work: Here you see density and where people come from. Every person is displayed by a dot. The dots are randomly distributed per neighbourhood to protect privacy.
What I find good is that it also mobile friendly.
The hard thing is when you zoom out, as there not enough pixels. So tennekes falls back to areas with different density.
For coloring, Tennekes used the HCL color space model, known per perceptual qualities:
- they used hue for composition
- luminance and chrome for density
The legend is dynamic, so it changes when you switch from density to individual pixels.
How was it created?
- Tile Server (small png images that are coupled, 1.3 images)
- created with R (tmap, png, raster, doParallel)
- tmap, leaflet
- How do users evaluate comp dot map
- Now distributed over land use, want to use buildings > see patterns of streets in rural areas
- create dot maps for other data (journalists, office, government)
- how visualise more than 3 categories?
- how to visualise more than 2 variables?
Understanding Grid-Based Census Results
presented by Michael Neutze
Before grid based results:
These graphics can still be useful for decision making. But there are also some limitations:
- You see a country like Norway dominate, pure because of land mass
- political boundaries may shift
- Some political problems don’t care about country boundaries (climate and air pollution for example)
So we can also use grid based visualisations. This is nice. All squares have equal areas, equal in shape (1x1 km).
Now you cant see where you are, so you need topographic background, which Neutze included. But then Neutze then painted a bleak outlook: he used random data as well, and as you see that looks quite similar.
Neutze: “So then I thought: okay, now how to spent rest of 10 minutes of my talk?”
But Neutze has been spending time looking at it, and found it turns out there are patterns. Here is an example. You see a couple of areas that show high vacancy, unlikely to happen due to chance.
Digging in, he found this was caused because of a large mining operation can keeps being extended, so people leave the area and houses are vacant.
Now show the same data on municipal map. Using the same zoom map, data just averages out.
All this data is hard to make sense of in tabular form. So people normally load data into GIS, but not everyone has access to GIS software. So they integrated this into their public facing atlas of the German census.
The live version can be found at atlas.zensus2011.de
Second use case
The second use case Neutze described is when you store census data on 100m grid. You might then be able to use this for example to check for zoning purposes and ask: how many people live south of yellow line? The slide below shows that you can now use these grids to make this calculations easily.
Neutze and his team created a public facing tool that let you do these visualisations. Any shape that you draw, the population count from census will be calculated
They used the tool to create some lists to share on social media. For example, they read that soccer fans often travel up to 100km to their favorite club. So using the tool they counted how people live in a circle of 100km from the different soccer teams.
This then attracted quite some attention as it shifted the rankings, so some people very happy to see their team at the top of the list.
Neutze concluded with following points:
An evaluation of data visualisation practices of statistical institutes
presented by Jorge Camoes
Jorge Camoes helps organizations better communicate insights from data.
Camoes states that there was a time we could focus on individual data points. Today we have too many data points, focus on relationships. When we talk about design, what happens often: design as make-up of existing charts. Camoes wants a shift perspective from illustration to communication.
Camoes has a few basic rules. According to Jorge, these are golden rules of Data Viz:
- talk about data viz
- break rules
- make new ones
Camoes took real examples. He just started at Eurostat as that was convenient. There are examples of countries as well. Camoes does not focus on tools. Everything can be done using Excel.
Then Camoes started off his high paced talk through real-life good and bad examples.
He asked questions: how relevant is the information you present in charts?
Some things not acceptable: 3D effects come at top of list
People don’t get dual axes:
Too much data in your graph:
First chart is too busy, most readers will not read it.
Better to turn in to small multiples chart.
I liked the point on not providing large graphs that do not communicate a lot of data:
He coined the swimming pool effect: chart is too wide. He has no specific guidelines, should not be stretched above 2:1 ratio
Color: readers will interpret color as something of importance. Color may also cause unwanted grouping. Color can better be used to make certain groups stand out.
A better method is to use tints, for example darker tint shows newer data.
Our fascination will all things circular
What is your intentions? A good example Jorge pointed out is the stacked bar chart. According to Jorge, there is too many color and it is hard to compare the subgroups. A better chart is the one below, where you can see the differences between the groups.
Should be used more:
Camoes boomering chart experiment:
Sharing visualisation tools between National Statistical Institutes: a successful experience
presented by Chris Laevaert (Eurostat)
Chris Laevaert works at Eurostat. In 2016, a big project called DIGICOM kicked off. Its aim is to modernise to communication and dissemination of European statistics
The focus of today was on Work Package 2: sharing innovative tools. Countries can participate on voluntary basis. They share interactive graphics and are embedded on the websites of the statistical bureaus of the different countries.
Why interactive graphics?
- nice to look at
- reach out to new audiences: they were targeted at people who normally not visit statistical websites
Here is another example, called Young Europeans. In this interactive graphic you select your sex, your age, me and my family and me and my free time. The user is asked a number of questions, can then relate to the average of country, sees European average and also average of other sex.
Last example: Quality of life. Identified subjective indicator and coupled it with an objective indicator
- Keep it simple
- Infrastructure neutral development
- Self contained deployment packages
- keep translation files separate for easy editing: no hard coding when translation has to be done
- Fonts have to support all European languages
- data layer based on web services to fetch data dynamically
- limited technical expertise required: technical only required in languages where you have longer words. Design was not properly aligned then.
- “Quick result”
- Need for support from Eurostat. Countries want to have a quick reaction when they run into an issue.
- Apart from Eurostat tools, it should be envisaged to share national tools among the ESS members
- Sharing will be continued
Last example: Flows of export of waste to different countries. Is in preparation now.
Laevaert and his colleagues or now busy working on a digital publication on women and men in the EU. Bigger challenge as it has to be translated by all countries.
- Mentioned 150+ times on web on the web in the first hour after the launch
- Now 150 mentions per day
- high press coverage
- users can give feedback
Very nice tool: great entertainment at our brunch (Austria)
Here are the consultations on Eurostat website
Question and Answers
Q to Camoes: Why did you not include interactive visualisations?
A: I did not look for them. Necessary in the future. Must be cost-effective. As the NYT has been telling us, they stop creating dynamic graphics as they are not cost-effective. (my note: link to interesting blog end of interactive visualisations) That could be different for statistical offices, but there must be a business case for it.
Q to Tennekes: is raw data publicly available?
A: Yes, publicly available. 20.000 neighbourhoods in the Netherlands. Age, gender, education, income class. If you are interested, contact Martijn.
Q to Tennekes: what is confidence threshold of your data?
A: Not sure how to answer this question about confidence interval. Data is from different reliable sources and on the map it is just dots, so no real confidence threshold.
Q: Do statisticians have to learn GIS?
A: Neutze: yes, they should! Good open source packages available such as QGIS. Ordinary laptop is sufficient. Tutorials available. 3DJS community, half is GIS based. If you are responsible for regional data, there is no excuse for not looking into GIS. Not much of a burden. SAS or R are more difficult to learn.
Q to Laevaert: how do you define usability?
A: we are preparing digital publication. It’s a bigger project within the work package. It will be shared by all member states. Now in process of defining visualisations. Just shared alpha version to get feedback from the countries, so that it can be improved.
Q to Laevaert: why is not every country using these examples?
A: maybe we should ask them… it is not obvious that this works. It is hard to share ideas and then put into practice. We only did a pilot phase. Now it is only the countries involved. Now we see the success: other countries that contacted us. It is an ongoing process. Finland want to participate. They have that problem with words that are 1km long. So either no interest, or either other tools or there were technical issues involved. Some countries do not have to translate. Was quite a successful experience. Ones who did it, found it works.