My 10 key takeaways from the new data journalism handbook

Next year will be my 8th year working as a data journalist with @AJLabs. Back in 2011, people like me were known as “journo-coders”, “hackers for hire”, “computer assisted reporters” and even “space geeks”.

A year later, the first edition of the data journalism handbook was published.

It quickly became the go-to textbook for many aspiring data journalists all over the world. In 2015, we helped translate the Arabic edition and have used it ever since to train Arab journalists as part of the Al Jazeera Media Institute’s data journalism and infographics course. We’re currently working on writing our own short guidebook to document our own experiences with data journalism in the Arab world.

After spending a couple days reading through the online beta preview of the new data journalism handbook, I thought it would be worthwhile to share my notes and key takeaways as we head into 2019.

For me, the handbook is an absolute must read for all data journalists — novices and veterans.

To keep things manageable, I’ve broken down my notes into three parts per chapter:

  1. One sentence that summarises the chapter to me.
  2. My brief commentary, notes and interpretation.
  3. My top “tweetable” takeaways (quotes and paraphrases).

TL;DR — 10 Lessons from the book:

  1. Data journalism does not happen in a vacuum.
  2. Data journalism is about more than just numbers.
  3. Data exists — accessing it depends on how persistent you are.
  4. Build your statistical toolbox and follow a reproducible approach (also, publish your code).
  5. #Dataviz: Confidence to deliver the story and confidence to consume it.
  6. We’re living among bots and cyborgs — surely we should learn how their decisions affect our lives.
  7. Remember what happened to your fancy Adobe Flash project?
  8. Data journalism is seen as a means to empowerment.
  9. So how do you measure the impact of data journalism projects?
  10. Data journalism is still just journalism, it too has its problems.

1. Introduction:

Data journalism does not happen in a vacuum.

The second edition of the book really helps make the point that data journalism is a collaborative process which brings together journalists, technologists, designers and product managers. For data journalism to succeed there needs to be a culture of curiosity, persistence, creativity and experimentation. For it to fail, remove one of those ingredients.

My top takeaways from this chapter:

  • “Data does not just provide neutral and straightforward representations of the world, but is rather entangled with politics and culture, money and power.” @jwyg and @bb_liliana
  • “Data journalism can be viewed not just in terms of how things are represented, but in terms of how it organises relations.” — @jwyg and @bb_liliana

2. Doing issues with data:

Data journalism is about more than just numbers.

It’s a great honour to be included in the opening chapter of the new data journalism handbook. Our project, Home Demolitions in Occupied East Jerusalem, highlights the point that “data stories” can really be “human stories” if you focus on the people behind the numbers. Over the course of a year we tracked every single home demolition in East Jerusalem to understand and provide context to an ongoing issue.

We used an iPhone to take all the 360 degree images and capture the specific GPS coordinates

The next two case studies really helps you appreciate how resilient and patient journalists can be to tell the story right. Discovering Trees in Bogota was faced with the all too common dilemma of telling a “data story” without data. Rather than giving up, they followed the data all the way back to the source’s source and then open sourced it :-).

Similarly, From Coffee to Colonialism undertook the mammoth task of tracking the journey of everyday food items from the original plantations to their desks. Just like with Home Demolitions in Occupied East Jerusalem it took them one year to complete their story and publish it on multiple platforms.

It makes you really think: are you really trying hard enough to tell your story?

My top takeaways from this chapter:

  • Data stories don’t need to be technical or expensive. Sometimes just counting an event over time can tell you a lot about the scale of a problem. — @haddadme
  • When you can’t get data from the source, go to the source’s source(s) (and then open source it). — @marisamagar
  • The way that users read and interact with a story is as important as the story itself. — @raulsanchezglez

3. How journalists assemble data:

Data exists — accessing it depends on how persistent you are.

The third chapter dives deeper into the “no data” debate by looking at four case studies from India, China, Cuba and Australia.

Combined, India and China make up over a third of the world’s population. This means that in addition to having so many news consumers, these two countries also are producing huge amounts of personal data — albeit mostly locked away in the private sector.

There are more people living inside this circle than outside of it. Map created by reddit user valeriepieris

All of the case studies talk about the real concerns with data ownership, access, credibility and transparency. One very helpful tip for journalists struggling to find local datasets is to search through international sources.

Chapter three really helps solidify the idea that just because data it not available in a nicely formatted spreadsheet that doesn’t mean that it doesn’t exist. With enough persistence, time and good old fashioned journalism, many newsrooms can create their own data.

My top takeaways from this chapter:

  • “There was no such database of ongoing land conflicts in India. So we decided to build one.” — @Kum_Sambhav and @Ankur_pali
  • We’ve witnessed stories in China that rather than being data-driven have led to the production of new datasets - such as air pollution. — @MaJinxin
  1. Journalists must always be creative in looking for meaningful stories hidden in data. “It is only possible if we continue to try.” — @Saimita24 , @yudivian , @ErnestoGuerra21
  • Numbers have special abilities beyond their symbols. Get to know what these numbers truly represent and ask them hard questions. — Helen Verran

4. Different ways of working with data:

Build your statistical toolbox and follow a reproducible approach (also, publish your code).

As of this writing, chapter four only includes one case study by investigative data journalist Sam Leon. Sam talks about how data comes in all shapes and sizes and it is up to us as diligent data journalists to hone our craft by understanding the fundamentals of statistics and data manipulation to prevent introducing errors into our stories.

The author then builds a solid case as to why we should embrace reproducible data processes and tools such as Jupyter notebooks and R Markdown. I myself have been using R Markdown for most of my data projects ever since taking Andrew Ba Tran’s amazing R For Journalists course earlier this year.

My top takeaways from this chapter:

  • “Machine learning, text analysis and some of the other techniques explored elsewhere in this book are increasingly being deployed in the service of the scoop.” — @noeL_maS
In your next data project, try using Jupyter Notebooks.

5. Different ways to experience data:

#Dataviz: Confidence to deliver the story and confidence to consume it.

Chapter five begins with a fascinating insight into how people consume data visualisations. Whenever discussions about data visualisations come up I immediately think of Alberto Cairo’s famous saying — infographics should clarify not simplify.

One of the key ideas from this chapter is that infographics and data visualisations are not a “one size fits all” storytelling format.

Through my own experience, I’ve seen how the exact same message in a written article may get lost between the words, while a compact infographic — who’s job it is to be concise and deliver a single message — may spark very active discussion.

Having confidence in your infographic’s single message is ultimately what I think what separates presenting facts to readers vs. explaining these facts to them.

This is why producing a good infographic is usually more time consuming than whipping out a 800 word article. The problem is: as long as infographics are treated as visual afterthoughts to your written story, they won’t be as effective.

My top takeaways from this chapter:

  • “Audiences need to feel that they have the necessary skills to decode visualizations, and many participants indicated a lack of confidence in this regard.” @hmtk , Martin Engebretsen, @RosemaryLHill, @visualisingdata, and Wibke Weber.
  • Remember - No piece of data is objective — “a database contains many human decisions; what was collected and what was left out; how it was categorised, sorted, or analysed.” @zararah and @stefanwehrmeyer

6. Emerging storytelling approaches

We’re living among bots and cyborgs — surely we should get to know how their decisions affect our lives.

Chapter six showcases three case studies about the power of data.

Former Al Jazeera America data reporter Lam Thuy Vo opens the chapter with a helpful reminder that people (not machines) produce most of the world’s data. She then introduces some very interesting case studies of how analysing social media data can reveal some eye-opening insights into how we as people communicate in our online worlds.

The next two case studies written by Nick Diakopoulos and Christina Elmer talk about algorithmic accountability — a field that I’ve personally been very interested in ever since I heard Diakopoulos speak about it at NICAR 2014.

Computer systems and their algorithms control so much of our everyday lives from our search results, our credit scores, which job we might land and even whether or not we’re seen as a security risk.

Both authors recommend to focus on the impact algorithms have on people and societies rather than focusing on the mechanics of the algorithms.

My top takeaways from this chapter:

  • “We have become the largest producers of data in history.” — @lamthuyvo
  • “Algorithms, animated by piles of data, are a potent new way of wielding power in society”. — @ndiakopoulos
  • Algorithmic accountability reporting has to grow rapidly to meet the challenges of an increasingly digitized world. — @ChElm
ProPublica’s Machine Bias is a great example of algorithmic accountability reporting.

7. Organising data journalism:

Remember what happened to your fancy Adobe Flash project?

December 11, 2018 was a big day for many data journalists. On that day Google announced that one of our favourite old-school dataviz applications Google Fusion Tables would be deprecated in 2019. This reignited the endless debate of whether or not to use (free) tools or hand code them using well established programming languages and libraries.

For anyone who’s ever had to deal with migrating servers or “your interactive is broken” emails, chapter seven is for you.

Data journalism professor Meredith Broussard opens the chapter with an anatomy of how data projects break. She points out that while dynamic web projects do come with additional technical baggage, the main reason why links break comes down to human decisions around how they are archived, migrated and stored.

Bespoke interactive projects should not be treated as a collection of HTML, CSS and JavaScript pages. Rather these files collectively make up valuable content which when you really think about it is a publisher’s currency in the media landscape.

My top takeaways from this chapter:

  • “Data projects are more fragile than “plain” text-and-images stories that are published in the print edition of a newspaper or magazine.” — @merbroussard
  • Data journalists should embrace the blurry boundaries between other fields (such as civic technology). This gives them a better chance to work in the broader technological, cultural and economic transformations. — @tweetbaack
Yeah, we’ve all seen this before.

8. Training data journalists around the world:

Data journalism is seen as a means to empowerment.

Sometimes when I come across a really awesome interactive built using the latest JavaScript libraries — only to discover that it’s about fast-food prices in San Francisco (no disrespect here San Francisco), it helps to know that in many parts of the world “data journalism is seen as a means to empowerment”.

Chapter eight, provides some refreshing insights into the experiences of Eva Constantaras as she works to develop data journalism skills in marginalised communities in Kenya, Afghanistan and Pakistan.

Constantaras provides many great examples of how “journalists have embraced data as a means to influence policy, mobilize citizens and combat propaganda.”

My top takeaways from this chapter:

  • “By attending to different aspects of injustice, inequality and discrimination, and their broader consequences on the lives of marginalised communities, we render them visible, measurable and maybe even solvable.” — @EvaConstantaras
  • “Most of these problems were invisible before and will become invisible again if journalists stop counting.”— @EvaConstantaras

9. Measuring the impact of data journalism projects:

So how do you measure the impact of data journalism projects?

The penultimate chapter places data journalism within the context of journalism as a whole and where it stacks up against other content on the web.

It opens with a writeup by author C.W. Anderson where he encourages data journalists to practice self-reflection in order to improve how their work is perceived and understood.

A good data project can making people care about a topic by explaining what the data means (and what it does not mean) rather than just presenting it.

The next author, Wiebke Loosen, talks about what makes an award-worthy data journalism project. Every year, since 2013, the number of data journalism projects submitted to the annual data journalism awards (which are now open for 2019) has increased. This diversity has really helped create a benchmark for a lot of data journalists including ourselves at AJ Labs.

The next two sections by data journalism professor Paul Bradshaw and Dr. Lindsay Green-Barber speak at length at how and why we measure impact in data journalism projects. It’s a fascinating insight which weaves together advertising metrics, business models and the influence of commercial and cultural metrics.

My top takeaways from this chapter:

  • “Data journalism may be the most powerful form of collective journalistic sense making in the world today.” — @Chanders
  • “Data journalism appears to be an increasingly global phenomenon as the number of countries represented by the [Data Journalism Award] nominees grew with each year, amounting to 33 countries from all five continents in 2016.” — @WLoosen
  • “The ability to measure impact on a story-by-story basis has meant it is no longer editors that are held responsible for audience impact, but journalists too.” @paulbradshaw
  • “Different types of journalism are better equipped for different types of impact.” — Lindsay Green-Barber

10. Reflections, challenges and future directions:

Data journalism is still just journalism, it too has its problems.

The last chapter concludes with an important reality check by Prof. Nikki Usher. She provides her take on why data journalism has largely failed to keep up with its original vision of holding truth to power using the assortment of computational skills that data journalists bring to the table.

Data journalism has always been about disruption and telling stories that would otherwise not be told. To truly unleash data journalism’s “revolutionary potential” I think a lot more needs to be done to cultivate experimentation and collaboration within newsrooms.

Without editorial vision and the right combination of data-savvy editors, story-driven developers, flexible UX designers and subject-specialist journalists, news articles will continue to be walls of text, longform stories will remain as cookie-cutter “scrollables” and map co-ordinates (whether or not it makes sense to map them) will remain on maps.

My top takeaways from this chapter:

  • “Data journalists need to own up to their hacker inspiration and hack the newsroom as they once promised to do; they need to move past a focus on profit and professionalism within their newsrooms.” — @nikkiusher

Where next?

At the end of the introductory chapter, editors Jonathan Gray and Liliana Bounegru present twelve challenges for critical data practice. To make these actionable, I’ve summarised them as single questions that you can ask yourself before embarking on your next data project:

  1. How does your data project bridge the gap between people and technology?
  2. How does your story provide the right context for the larger issue at hand?
  3. How are you using your data to tell relatable stories?
  4. Are you clarifying your data or simplifying it?
  5. Are you applying the foundations of effective visual communications and are you experimenting with new approaches?
  6. Is your audience involved in the production process? If not, have you tried to tap into their knowledge-base to gather data?
  7. Are you actually telling a story to your audience or are you just presenting some facts?
  8. If you removed the main technology behind your story, would it fall apart?Are there other formats in which your story could continue to live on?
  9. Are you the vehicle for the story or the focus of the story? (Be honest)
  10. Did you speak to the actual people affected rather than only draw conclusions about them using their data?
  11. What value does your project have to you, your boss, your company, the world?
  12. We know your dataviz is cool but how is it getting people to rethink their beliefs and assumptions?

Thanks for reading.

I hope to update this post with a summary of the remaining chapters once they become available. Thanks to the whole team at the European Journalism Centre and Google News Initiative for making such a valuable book available to everyone for free. You can read the book online here:

Produced by European Journalism Centre and Google News Initiative


13 claps