An early map created by Sarah Gonzalez of WNYC as part of the Dirty Little Secrets investigation showing all the known contaminated sites in New Jersey. Data Source: http://www.nj.gov/dep/srp/kcsnj/.

Basic Data Tools for Local Journalists

Data is all the rage these days, but most local newsrooms simply can’t afford the cost of hiring a data journalist or maintaining a full-time data team. Luckily, there are plenty of free online alternatives out there that local, entrepreneurial, and freelance journalists can use to spruce up their content and add context to their stories.

Below is a list of a few data tools and services that backpack journalists can use in a pinch — without breaking the bank. The list isn’t even close to exhaustive, but it should contain enough information to get you started. I’ve also included a list of helpful tutorials and online resources at the bottom to help you navigate your way around some of the tools and concepts mentioned in this post.

Cleaning the Data

Anyone who’s ever filed an OPRA request (New Jersey’s version of FOIA) or tried to work with a dataset that they didn’t create knows how messy other people’s data can be — especially when they come from local and municipal governments. Half the time you get a stack of unreadable PDFs, and the rest of the time you end up with a dataset that looks like someone fell asleep on the keyboard while entering and organizing the data. That’s why it’s important to have a repertoire of data-cleaning tools at your disposal, especially if you don’t happen to have a Master’s degree in Excel.

With that in mind, here are three tools to help you tidy up those messy datasets and hopefully preserve whatever’s left of your sanity.

1. Data Wrangler (now Trifacta Wrangler)

Data Wrangler is a free, web-based tool from the Stanford/Berkeley Wrangler research team that helps you make sense out of your messy datasets. It allows you to paste large amounts of text-based information from a CSV into its interface, which digests that information and allows you to edit it before exporting the new dataset for use in other programs.

The Stanford/Berkeley team recently completed their project, and they are no longer offering active support for the software. Instead, they’ve launched a commercial venture called Trifacta, which includes a free version of the tool called Trifacta Wrangler.

Click here for a tutorial on how to use Data Wrangler from the folks at Visual Cinnamon.

2. OpenRefine (formerly Google Refine)

OpenRefine is the open-source reincarnation of Google Refine. One of the biggest differences between OpenRefine and Data Wrangler is that OpenRefine keeps your data on your computer instead of requiring you put it online. This is a very important distinction for those of you who are working with sensitive or secure datasets.

Not only does it clean your data, but OpenRefine also lets you know when it thinks one of your data points might be corrupted or incorrect. If it detects a missing decimal or comma, it flags that data point for you to review and correct if necessary. This can be very useful when analyzing data from forms that were submitted by readers or third-party organizations.

Both tools let you find and merge different representations of the same type of data, saving you time and frustration.

Here’s a complete breakdown of why and how you might use OpenRefine, courtesy of Computerworld.

3. Tabula

Tabula is kind of like Adobe Acrobat Reader’s revolutionary OCR Text Recognition feature. But instead of reading a PDF and converting it to searchable words and text, it reads tables and sections within PDFs and turns them into readable CSV files. All you have to do is select a table or an area within a PDF, and Tabula does the rest. It extracts that table or area and spits out a preview and a brand new CSV file. Tabula is used by media organizations like ProPublica, The New York Times, Foreign Policy, the St. Paul Press, and more.

As I mentioned earlier, anyone who’s filed an OPRA request with a local or state government knows what a painstaking nightmare it can be to have to go through each page and extract usable information. Even as the Open Data movement continues to pick up speed, most government entities are still churning out PDFs.

Visualizing the Data

Spreadsheets might be what gets data journalists out of bed in the morning, but most media consumers are already fast asleep before they even finish reading the column headers — and what good is a dataset if no one bothers to read or try to understand it?

That’s where data visualization comes in.

Source: ThinkByNumbers.org, “Death and Dollars: Common Causes of Death vs. Prevention Funding”

We’ve all seen the fancy infographics and interactive timelines about drone strikes in the Middle East, police killings in the US, government threat spending, and other topics. Most local publications and independent journalists, however, simply don’t have the resources or the expertise to pull off something of that magnitude. But you can still use the three tools listed below put together a perfectly acceptable data visualization of your own (for free).

1. Timeline.JS

If you ask me, the Publishers’ Toolbox by the Knight Lab at Northwestern University has some of the coolest free visualization tools out there. They’re really easy to use and they look great on any device.

This is a timeline I created to show how many times Condoleezza Rice mislead the American public during the lead-up to, and in the wake of, the 2003 US invasion of Iraq. Source: http://www.muckgers.com/2014/03/why-dont-they-want-condoleezza-rice-to-speak-at-graduation/

Timeline.JS was the first Knight Lab tool I’d ever used. I was immediately hooked. It’s an open-source tool that allows you to create engaging, interactive timelines for free. All it takes is a Google spreadsheet, and they even have a special template set up so you can jump right in and just fill in the blanks with your content. At the same time, more experienced users can use their JSON skills to modify the timeline even further, while still retaining the tool’s core functionality.

You can add media from a variety of sources and display them on the timeline, including content from Twitter, Flickr, YouTube, Vimeo, Vine, Dailymotion, Google Maps, Wikipedia, SoundCloud, Document Cloud and more. The timeline can be embedded on most sites (except Medium.com, of course), and is a great way to illustrate concepts that might have otherwise been less engaging to some readers.

2. StoryMap.JS

Another fantastic tool from the Knight Lab at Northwestern. StoryMap.JS is very similar to Timeline.JS, and the setup is pretty much the same. The only difference is the mapping component.

This is a StoryMap I created to showcase the work of the Center for Cooperative Media over the last several years. Source: http://njnewscommons.org/grow-strengthen-a-look-at-building-up-the-nj-news-ecosystem/

The mapping component allows you to display an interactive map alongside each datapoint in the spreadsheet. It combines elements of Timeline.JS, allowing you to show not only where and when something happened, but also the order in which it happened.

All of the Knight Lab’s tools come with easy-to-follow instructions and Google spreadsheet templates to help you get started on your own project.

Be sure to check out the rest of the tools in the Publishers’ Toolbox.

3. Infogr.am (Basic)

Infogr.am is an online tool that lets you create charts and infographics in just a few steps. Once you’ve cleaned up your datasets, you’ll want to figure out how to showcase your data for all the world to see. If you decide that infographics are your best option, Infogr.am is a great place to start.

Source: Screengrab from the Infogr.am homepage.

The PRO plan starts at $19 a month, but they also offer a free BASIC plan that includes 10 infographics using 10 uploaded images.

The basic plan doesn’t allow private sharing, downloads, or live connections, but it’s a great way to familiarize yourself with the process before diving in. If you decide you can afford to fork over the extra cash for a PRO plan, you’ll get access to more features, including real-time data connections to your visualizations through services like Google Drive.

Mapping the Data

If fancy visualizations, infographics, and timelines aren’t really your style, or maybe they just don’t mesh well with the story you’re trying to tell, you can always rely on a good, old-fashioned map to get the job done.

I wanted to include a picture of Dora the Explorer here, but I guess a close-up of the NJ media ecosystem will have to do. Credit: Debbie Galant

1. Google Fusion Tables

Google Fusion Tables is a web service provided by Google that lets you turn your datasets into maps with that classic Google Maps feel to them. Fusion Tables has become the go-to data management tool for journalists and publishers in the US and beyond. It’s simple, clean, and relatively easy to use, so it’s not hard to see why so many media organizations consider Fusion Tables to be one of the best dollar-for-dollar data mapping tools out there.

Source: Tracking Per-Pupil Spending and Teacher Salaries Across NJ Districts by Colleen O’Dea of NJ Spotlight.

Whether you’re just starting out or you’ve been working with data for years, Fusion Tables is a great tool for gathering and displaying geographical data on the web. You can use your own data, or you can merge your datasets with someone else’s data about the same subject and view all the information in one place. You can even use Google Tables to search through thousands of public Fusion Tables and other datasets from around the world.

Once you have the data, you can use Fusion Tables to instantly map points, lines, polygons, addresses, places, countries, and more. Fusion Tables also automatically interprets location-based data, so you can create useful features like heat maps or display cards that appear next to their corresponding datapoint. You can even use Google Forms to gather the data and send it to a Fusion Table, where it will be automatically added and displayed on the map.

2. Open Heat Map

Open Heat Map is a great tool if you want to spend the least amount of effort possible to create something both newsworthy and visually appealing. They’re great for showing how intense something is or how often something happens within a certain area. You can use heat maps to show population densities, price ranges, average temperatures, and more.

“How the US unemployment rate has varied over the last five years, county-by-county and month-by-month” via the Open Heat Map gallery.

It used to be that if you wanted to make a heat map, you had to actually understand math and how to render and add map tiles to existing maps — not my strongest suits.

Open Heat Map lets you skip all of that nonsense and get right to the good stuff. All you need is a spreadsheet with some geo-located data. The data can include exact coordinates, addresses, or even just the name of a particular place. Upload the data and let Open Heat Map do the rest.

The resulting map isn’t going to blow anyone’s mind. Also, there’s no key, so you won’t be able to breakdown the data very much, but it only takes a few clicks and it’s a great way to create a simple, usable visual component for a your story.

3. Tableau Public (Free Version)

The professional edition gives you unlimited space and a whole host of additional features, but you’re reading this introductory post so I assume you’re not ready to go that deep just yet. Tableau Public is great for displaying different types of charts together in the same visualization.

From “50 Years of Crime” by Shine Pulikathara. Source: https://public.tableau.com/s/gallery/50-years-crime-us

You can also use it to explore other datasets on the web. It has a drag-and-drop interface that can suggest different visualization types based on the data you’ve entered.

One of the drawbacks is the need to format the data in a specific way in order to get the full value, but once you get the hang of it everything seems to work just fine. In the free version, your visualizations have to remain on the Tableau site, so you can’t save your progress without making it public. You could always cough up the $999 to upgrade to the single-user desktop version, but I guess that depends how deep your relationship with digital charts goes.

Other Tools, Tips, and Tutorials

I’m just over the 2,000-word mark with this one, so I’m gonna wrap things up here. I’ll leave you with this short list of helpful tips, articles, and tutorials to get on the right track. Some of them are a few years old, but still packed with useful information, nonetheless:

Articles & Lists

  • ComputerWorld | Chart and image gallery: 30+ free tools for data visualization and analysis
  • BigGroup | The Best Free Data Mapping Tools
  • Creative Bloq | 20 free data visualisation tools
  • Creative Bloq | 10 free tools for creating infographics
  • The Guardian | Data visualisation DIY: our top tools
  • iCrunchData News | Top Five Free Data Tools Available Today
  • Hongkiat | 20 Gorgeous Examples Of Timeline In Web Design For Comparison

Tools

  • Prezi | Best. Progam. Ever. (My opinion.) It’s basically PowerPoint on steroids but once you get the hang of it, you can use it for all sorts of projects and other cool stuff.
  • Statista | Statista is one of the world’s largest statistics portals.
  • Datamarket | Better-known as a data supplier, Datamarket is actually a pretty nifty tool for visualizing numbers too.
  • Charts.JS | Simple, clean and engaging charts for designers and developers.
  • Processing.JS | Processing.JS makes your data visualizations, digital art, interactive animations, educational graphs, video games, etc. work using web standards and without any plug-ins. You write code using the Processing language, include it in your web page, and Processing.JS does the rest.
  • Paper.JS | While Processing.JS has been around for a few years, Paper.JS is the new kid on the block. It’s worth keeping an eye on this library.
  • Target Map | The software will look for the file’s geo-codable data and then upload it straight into a map for you.
  • VizyDrop | Create and share charts for CSV, Excel and JSON files. Connect and visualize data from apps you are using. It takes no effort. 
    One single place, the natural way, all for free.
  • Datamatic | The easiest way to publish beautiful, 
    branded data visualizations.

Tutorials

This post is a response to the question, “What is the best app, site or software to map data?” The question was submitted via Hearken by Edward Correa of HechosLatinos.com, and will be featured on the Center for Cooperative Media’s Frequently Asked Questions (FAQ) page.

Do you have a question about the local news business? Use this Hearken module or click the image below to ask us a question. We might even turn your answer into a video tutorial. Also, don’t forget to subscribe to our daily newsletter for news and updates from the New Jersey news ecosystem!