How to report data in a way that readers need to know

Avoid bogus over-precision and un-needed complexity, but give readers full data labels

For general readers, one of the recurring problems of academic books and articles, PhDs and research reports is a syndrome of overly precise reporting of results, often paradoxically allied with an under-description of charts and tables. Often this reflects an effort to cram into one table or chart all the information that might be needed by multiple different readers — here every table, chart or exhibit takes on the function of ‘reading into the record’ complete data. It can also show a failure by authors to think seriously about what readers need to know and then to deliver information in appropriate formats. In a previous post I have outlined four general principles to follow in designing exhibits.

But combating over-precision more generally can significantly improve the readability of academic texts. So here are seven suggestions for improving the accessibility of your data reporting, without sacrificing academic respectability:

1. Separate out main text tables and charts from large-scale results and data presentation in Annexes. Exhibits within the main text should fulfill a single purpose, being carefully chosen to support that part of the core argument, but not being burdened with more detail than is needed. Annex tables, by contrast, can be far more detailed, encompass multiple purposes, play a ‘version of record’ role, and be the main place where you demonstrate the accuracy and replicablity of your work.

For most academic journals, online Web annexes have removed many of the previous restrictive limits on presenting full sets of results in a retrievable mode. For books, depositing a Web annex or your full dataset in machine readable format with your university repository (reachable via permanent URLs) can fulfill the same function.

2. Quote numbers that are only as detailed as most readers need. Researchers, PhDers and most students working in a university environment correctly value accuracy and precision. But this does not mean that when you have to report something, the only professional response is to go all out for precision. In main text exhibits ask What do (most) readers need to know here’? Avoid providing a level of detailing that is of no conceivable interest or use to the vast majority of readers.

So if the number of unemployed people drawing welfare benefits is shown as official statistics as 2, 816,013 the correct number to include in a text table might be 2.82 million. If a coefficient comes out of an R regression run with seven decimal points (for example, 0.527339) then the appopriate number to report might just be three decimal points (0.527) or two (0.53).

3. Only quote numbers at a level of accuracy that is credible. If a government reports 2, 816,013 unemployed people, what credence can we attach to each digit here? Putting the complete number implies that you believe the total is correct to + or - one person, which is highly unlikely to be correct. Putting 2, 816 in a table where the units are 000s implies only a confidence that the recording system is accurate to + or - 1,000 — which in a large country like the UK or France is more realistic.

4. Quote only as many numbers as readers need. The great vehicle for poor academic communication has been the giant table, spread across whole pages with dozens of numbers — usually ‘dead on arrival’ data, that no one else will ever use or cite for any purpose, and of course listed in a near-random or chaotic order. This is especially true of regression and other multi-variate analysis outputs. Here academic papers often do a Salome-esque ‘dance of the seven veils’ in which one set of inadequate or incomplete regression models are presented in succession, before a more adequate set of models finally materializes.

Many of us have also no doubt enjoyed those conference or seminar presentations where the speaker has scanned a typescript page of regression results into Powerpoint. (Often the speaker apologizes diffidently to the audience that it’s ‘not readable in the back of the room’, when in fact it’s not readable a metre from the podium). Such bad practice examples in academic work are particularly corrosive for students, who can unreflectively imitate what they have seen others doing. I’ve supervised teams of advanced students doing group projects for ‘real life’ clients (such as government agencies or management consultants) whose first drafts include Stata outputs with dozens of regression coefficients, all reproduced direct from the printout to seven decimal places.

5. Avoid using unnecessarily complex numbers, ones that that most people (usually including the author) find hard to understand. The most widely and intuitively understandable numbers by people are in the range from 0 to 10, and then in the range from 1 to 100. With percentages, you should always use the full form (65.1%) and never the decimal form (0.651). When official or scientific data present numbers in other ranges, especially those that are very large or very small(e.g. 0.26 to the power -6), it is always best try to re-base the numbers to get them back into more tractable formats, even if this means going against common conventions.

6. Wherever feasible, arrange the rows or columns in tables, or the sequence of bars in a bar chart, or the set of series in a line bar chart, so that there is a simple and easily recognizable numerical progression in the data. This is best achieved by putting the largest numbers at the top and left of a table, or at the top of a bar chart — with subsequent rows and bars arranged in a descending order of size. Ideally in a larger table numbers should also decrease from left to right across columns. If there is a pattern in the data, then readers can easily cope with an unfamiliar sequencing of labels — they can ‘see what’s where’.

Why does this still happen so rarely in academic work at present? Usually it reflects authors reproducing information arranged in an alphabatical order (perhaps from an official source), or in some other customary or historical sequence, without trying to achieve any thought-through logic to how the table or chart is presented. The resulting numbers are consequently randomly jumbled, so that even the most expert readers cannot see a pattern in them.

7. The counterpart of avoiding bogus over-precision or complexity in data is to always fully describe what numbers are being shown. This means accurately labelling the x and y axes of every chart, and the columns and rows in all tables, whether in the main text or Annexes. Wherever feasible labels should give the full, readable name of a precise variable that is being shown, and include the unit of measurement used. Take care when showing percentages that readers know exactly what they they are %s of. In change data, make sure you say ‘percentage point change’ where needed, and not just ‘per cent’. Table formats should allow for full row and column labels wherever possible — and explanatory notes below each table should explain any unavoidable abbreviations. Using horizontal bar charts (instead of vertical bars) is also always much better for accurate labelling.

A lot of academics and PhDs do not take their obligations to accurately inform readers seriously enough here. It is still common to find academic authors presenting results without having having any axis labels at all (often just relying on the table or chart heading to explain what is shown), In formal fields author can choose gnomic labels that are completely obscure, such as single word labels or worse still algebraic symbols or whole formulae or function equations, with no measurement units specified. Often the author seems to assume that readers will process linearly the entire text of their paper, chapter or a book in a careful way, so that all the variables, indices or statistics presented in tables and charts are already thoroughly understood — this is actually very rarely the case.

To follow up these ideas with examples of each point, see my book: Patrick Dunleavy, ‘Authoring a PhD’ (Palgrave, 2003), where Chapter 7 covers ‘Handling attention points: data, charts and graphics’.

See also useful material on the LSE’s Impact blog and on Twitter @Write4Research.