Tables All the Way Down

The ubiquitous data table is the data worker’s best friend.

Published in

VisUMD

4 min readNov 9, 2022

Businessman inside a bar chart (image by MidJourney v4).

Many professional analytics and data scientists see data tables as an unsophisticated form of data analysis, and the people who use them for design or research as unskilled, often dubbing them “data workers”. But perhaps this perception of these people as unskilled and their tools as “unsophisticated” is both unfair and unwarranted? This is what new research by Lyn Bartram, Michael Correll, and Melanie Tory seems to say.

Data workers consider spreadsheets as their quintessential table tool to interact with their data in ways that are hidden or abstracted in more complex tools. In their 2021 paper that appeared in the IEEE TVCG journal, Bartram et al. discuss how tables are crucial to data workers and how the researchers went from asking “how do these data workers clean and analyze data?” to “how do tables, as visual and analytical objects, contribute to sensemaking and understanding?”

To answer their question, the researchers conducted a study where they invited people who work extensively with data but don’t identify themselves as data scientists or advanced analysts. They conducted their study over Zoom and asked questions about the methodologies and tools they use. The participants brought in their own work they were currently working and presented to the researchers how they conducted their analysis. The researchers based their findings on three areas:

The physical architecture of the tables;
The actions’ participants took when using these tables; and
How participants reflected on their data work.

The participants worked with something called base data. Base data essentially means the lowest levels of detail the participants worked with. Few participants were more inclined to work with this base data since they felt they would understand the data better and would make better filters and summaries. The most common trait found in almost all the participants was, they all created new spreadsheets for pivot tables, summaries, etc, and never altered or modified the base data. The reason being, the participants felt some kind of ownership of the base data. Each participant had a different perspective on how to arrange rows and columns. The common pattern was arranging the rows and columns so they could visually associate or discriminate data both within and across tables. This particular way of arranging the rows and columns is called spatial organization. Spatial organization paved a way for these participants to create tables that could be understood by anyone who doesn’t identify them as data scientists or advanced analysts.

The columns, summaries, and comments created by the participants are called marginalia. Marginalia are designed for human readers and add context or additional details to otherwise unadorned base data. The participants highlighted a cell when they want others to check their work and add anything else to their table. These are called annotations. Participants used different techniques to highlight their needs in the spreadsheet. The researchers observed in-cell commenting which is an intersection of marginalia and annotation. Many of the participants did not use in-cell commenting, since comments might get lost while transferring data between tools.

Most of the participants were comfortable using Excel since it could easily export the tables. Participants mentioned it would be easier for them when their managers asked for the tables in PowerPoint. They would just have to export to PowerPoint. Though the participants used other data-cleaning tools they always reverted back to spreadsheets.

When it came to verifying the data, each participant had a different opinion about it. A few said data can never be perfect. Participants explain errors can be made at any level of analysis. The techniques that caught my eye are mentioned here. One participant told, “I don’t want to put garbage in: I want to have the best garbage possible.” Some participants said whatever the error maybe they first dive into the base data and check for patterns or errors. Another participant told he’d check the changes he made during the day at the end of the day.

When it came to manipulating the data, most of the participants did not alter the base table. It was common for them to append new columns as marginalia. The most common spatial organization strategies used filtering and sorting to group associated rows. While a few participants did not re-order their data, some did: they wanted to see the most important data first.

Tables give users a chance to experiment with data and give them a consequence-free area to work. Data workers should trust other tools as much as spreadsheets to integrate other tools into their workflow. The most crucial finding in the paper is that if data workers can’t get direct access to the base table, they will find other tools where they can do that.

After these observations, it’s clear that there has to be a way where these data workers can find better ways to improve their craft. Furthermore, spreadsheets and tables should not be disparaged. The table has affordances that are unmatched even by complex analytical systems. Instead, we should strive to implement the table in visual analytics systems and create new interaction methods to support data workers.

In summary, data tables are not unsophisticated, and nor are the workers who use them. In fact, in real-world data analysis, it is often tables all the way down.

References

Lyn Bartram, Michael Correll, Melanie Tory. Untidy Data: The Unreasonable Effectiveness of Tables. IEEE Transactions on Visualization and Computer Graphics, 2021.

Tables All the Way Down

The ubiquitous data table is the data worker’s best friend.

References

Written by Arush Pamulapati