Data visualization has been around for some time. We remember in the late 90's being struck by the pure awesomeness of “XPLANATiONS” infographics published in Business 2.0 magazine. Not long after, we discovered from the great Edward Tufte, the rich history of data visualizations going back perhaps to the earliest of times.
While certainly not new, there’s no question that interest in data visualization is on the rise. In fact, there has been a dramatic upswing in interest on this topic since 2007.
But, this isn’t just media hype. Data visualizations are entering the everyday lives of academics, business people and the media alike. So, why now?
In a way, data visualization is the result of software eating the world. While software consumes the world, it expels data as its exhaust. Indeed, where there is fertile ground of raw data, structured and organized synthesis of this data will grow.
All of this is goodness. As the world becomes more complex and nearly drowning in data, we need quick and concise summaries of the information we need to know at exactly the time and place we need to know it. Increasingly data visualizations are taking full advantage of new presentation and analysis toolsets; providing interactive layers of understanding never before thought possible.
But, with so many tools coming online and so many ways to visualize data, it’s harder than ever to do it right and derive meaning. The Nine Steps outline the approach and the core principles required to develop successful interactive visualizations.
Overview: The Nine Steps
We present here, in summary, the nine steps our team takes in building interactive visualizations for our clients. This serves as a framework for our team to repeatedly conceive, design and develop successful data visualization projects. Each of these nine steps can be boiled down into what IDMLOCO terms “AAA Visualizations”.
Each “A” represents one of three key attributes of data visualization greatness: Available, Accessible and Actionable. Any great visualization process will contain each of these attributes:
AVAILABLE — Is the source data to meet your goals for the visualization obtainable?
- Step 1: Identify Desired Goals
- Step 2: Understand Data Constraints
- Step 3: Design Conceptual Model
ACCESSIBLE — Can you present this data so that others can retrieve and make sense out of it?
- Step 4: Source & Model Data
- Step 5: Design the User Interface
- Step 6: Build Core Technology
ACTIONABLE — Does your visualization provide meaningful and relevant insights to the target audience?
- Step 7: User Test and Refine
- Step 8: Launch to Targeted Audiences
- Step 9: Stay Updated
This is not exhaustive of the process we follow nor is it a one-size-fits-all for others to strictly follow. Instead, we hope to provide a general idea of the thinking and steps to undertake when building interactive visualizations. To aid in the explanation of these steps, we’ve also included references to one of our favorite interactive visualization projects of all time: the California Center for Jobs and Economy (Center for Jobs).
Step 1: Identify Desired Goals
What is it you are trying to achieve? Don’t build the visualization without the answer to this question. This question is a necessary constraint that needs to be set at the very outset. Otherwise, the permutations of possible analysis can quickly mushroom out of control.
Frequently, when working with our clients to develop visualizations, the goal is to educate and persuade opinion leaders on important policy matters. Too often, these opinion leaders are exposed to rhetorical arguments from a variety of competing stakeholders, each with little substance to back them up. Or, a discussion of the facts is relegated to policy “wonks” whose obscure analyses are so inscrutable that they must be taken on faith. Such is the state of today’s aged and dying model of public affairs. Data visualization is a necessary change-agent driving us from the subjective to the objective.
In the case of our Center for Jobs client, the goal was simple: provide California’s leaders the information they need to make the best policy decisions possible. State and local government leaders make policy decisions with huge impact on our future, and no one plays from the same information because it’s all locked away in some PDF on a server who knows where. The Center for Jobs set out to change all of that.
To do this, the Center needed to do three things really well:
- Collect and store quality data about the California economy
- Present the data in way that is accessible and usable
- Breakdown the data by state, county, region, and legislative district
Step 2: Understand Data Constraints
So, you’ve completed Step 1 and have your grand vision defined. Step 2 is about bounding that vision with the realistic constraints that exist around data. Constraints often manifest in the form of one or more of the following:
- No single source of all required data
- Time series gaps or mismatches
- Obscure or dated data file formats
- Competing sources of similar data
- Gatekeepers possessing or guarding non-public data
At this stage, constraints aren’t necessarily bad or good, they just represent challenges you will need to overcome if you are going to achieve your goal. Often, there are highly creative solutions that will allow you to navigate through these constraints.
No one organization possessed all of the information we needed to achieve Center for Jobs’ vision. From the California State Employment Development Department (EDD), American Community Survey, and other reports, the required data was stored across numerous disparate websites and databases, unconnected, unrelated, and unused.
For example, the California Employment Development Department compiles tons of information about California’s counties, including: employment rate, size of labor force, population, and industry sizes. However, the data is only available in long tables of numbers: it might as well be Greek to anyone who looks at it. The needed information is there — especially in government — but it just isn’t easily accessible.
Step 3: Design Conceptual Model
The human mind understands smaller numbers fairly well. We get that $98,234 is quite a bit more than $72,846. Throw in just a few more values and our working memory is tapped and it all becomes data soup. The solution is to leverage the other strengths of the human mind: it’s nearly impossible to know what the largest number is when you have 58 values, but it’s easy to see which line of 58 is the longest. In designing interactive visualizations, this is often our goal: any information that can be understood clearly through a combination line length, relative position, size, and color should be done so. It levies the cognitive load and brings the data to life. You shouldn’t need a calculator and a math degree to see that Marin County is one of the best performing areas of the state.
And, so, with this in mind, we started to develop our conceptual model for how the Center for Jobs should present the available data in a way that would allow it to meet its goals. In fact, the early going was crude and rough:
But in this early stage, pretty or accurate were not the primary concerns. Creating an early straw model that we could quickly iterate and review was. Indeed, the finished product was nothing like this initial product. But we needed something, a crude scaffolding of sorts, to jump-start our team’s efforts.
Step 4: Source & Model Data
With a very early concept in place, Step 4 toggles back into the data. At this stage your focus is to obtain all required data sets and make them work together in beautiful harmony. We leverage data modeling to painstakingly document every piece of data and related meta-data.
Working through the data model for the Center for Jobs was a major lift. Using an iterative process with our data team we landed on 143 data columns for 190 distinct geographic areas, updated every month. How do we turn that into a database structure that is manageable, easily updatable, and efficient enough for web pages? We created a unique data object for each geographic area, and then tied each set of 143 data columns to one of those geographies with a date stamp. We could then query data for a given geography, order it by the date stamp, and plot a time chart of unemployment values over a given time.
Most of the sets we could easily obtain were organized at a county level. But, county level data is only meaningful for parts of the state. For example, looking at county averages for Los Angeles is futile at best: how do you make the right economic choices for people in Compton when the available data includes Beverly Hills and Pasadena? For California to progress economically we had to understand what is really happening; and a lot of elbow grease (this is a technical term) got us there.
Step 5: Design the User Interface
Only now, with the data fairly well defined and locked down, should you begin to design the interactive visualization’s interface. Similar to a web design process, you might expect to produce a few rounds of wireframes and mock-ups at this step. But, it is most important to remember that user interface design does not happen in a vacuum. In fact, it is only done right with the input of many multiple subject matter experts and potential users.
So, in the case of Center for Jobs, our client assembled a strong team of advisors and data experts. The client knew what they wanted to accomplish, the data consultants knew how to get the information, and it was our job to present it in a compelling way, both in information design and technical development. Each week we went through a full design cycle: brainstorm, design, review, and repeat. The team gave critique with each member valued for his or her particular area of expertise. There were far too many moving pieces — data, interactivity, visual design, copy, technical implementation — to orchestrate the whole thing in one shot. Working in passes allowed us to refine concepts as we went.
We spent as much time refining our work as was required to get it absolutely right…
…it’s a tedious and grueling process, but…
…on and on we iterated…until we ended up with what you can see today:
Step 6: Build Core Technology
The good news is that there is an ever expanding universe of off the shelf options for interactive data visualization tools available:
- jQuery Visualize
- Modest Maps
The bad news is that comparing one tool to another is often an apples vs. oranges exercise. In fact, there are about as many apples, oranges, bananas and peaches listed above as you could expect to find in a fruit salad. So, picking the right tool is an important and highly technical decision. It is one not to be made lightly.
There wasn’t a base technology package that could do all the things that we needed the Center for Jobs to do. We needed a flexible database framework, a maintainable data base administration system, a web-based visualization framework, a content system for news, articles, and reports, and it all has to play well together.
The base choice for database and data management was the Django web framework. Django backend and data modeling abilities has more than proved itself by powering Instagram, Pinterest, and the Washington Post. That led us to a robust Django-based content management system called Mezzanine. The answer to the visualization question came indirectly from the client: “it would be great to have something like this site (NY Times)”. It was an interactive map of California highlighting demographic data. After some research, we found that the NY Times interactive team built its own visualization package, D3.js, and made it available to use for other projects under an open source license.
Once we had a data model, we could focus on pulling data for the site visualizations. We were smart in choosing the same graphics framework used on NY Times. It’s well built, documented, and has a manageable learning curve. The programming challenge was to query the database, format the values to fit the front-end requirements, and serve it to the browser efficiently. Seemed clear when we started. If only. The data model was clear, but led to lots of processor overhead. To work out the inefficiencies, we developed a look-up/reference method: take everything you may need to look up like names and labels, convert it to a simple data structure that is simple to look-up, and then call out the details whenever you loop through. There are lots of moving parts, but we’re able to process, format, and display tons of data — and fast. Each profile makes 26 database calls, formats the information, and prepares the visualizations in approximately 1400 milliseconds, which makes for a perfectly acceptable page load time.
Step 7: User Test and Refine
Over the years we’ve learned not to launch any new technology without extensive user testing. We test pretty much everything:
- Device, browser and operating system compatibility testing
- Functionality (bug) testing
- Security testing
- Performance testing
- User interface testing
Of all of these, user interface testing is one of the most important because it is not a simple pass/fail like some of the others.
For the Center for Jobs a major user interface design challenge came from showing the number of jobs and average yearly wage data for 20+ industry sectors on each geographic profile. The visual design of each industry was clear: draw two bars, one representing the current number of jobs and below another representing number of jobs 5 years ago. Do the same thing for wages. If the current line is shorter — jobs were lost or wages went down — make it red.
However, we had to stack each industry in a list because of page width, and long lists of data are not easy to use. Moreover, this is exactly what the Center set out to combat. We needed a compact and intuitively visual way to get at the same information. In our user testing, we questioned if there was a way to correlate the wages and the number of jobs for each industry. From our testing, we realized that if we plot the number of jobs horizontally and the average salary vertically, we could turn each industry into a rectangle. If we stacked them all next to each other, we could get a quick picture of industry: wide rectangles have more jobs, tall rectangles make more money. A visitor can tell at a glance where the jobs are in a district or county and get some idea of the relative quality of those jobs.
Step 8: Launch to Targeted Audiences
It is probably obvious to say this, but: what you build is no good, unless it is in the right hands. It is not enough to just build it and expect users to come. Sure, you can build in “viral” tools that allow users to share with others, that is definitely a must. But for those viral tools to work you need a base of initial visitors who you must initially make aware of what you’ve created.
Here is where smart, targeted marketing and promotions come into play. Start by identifying the stakeholders who will value what you’ve built the most. It’s a definite bonus if they themselves have a large following. In the case of our Center for Jobs client, early attention from one of California’s most respected journalists was a huge boost.
Our traffic began to grow as we managed a series of promotional activities to attract new visitors. Our outreach efforts were working, which begged the question: were these visitors finding it useful? One of the best responses we’ve had didn’t come in the form of praise, but surprise. A California Assemblyman remarked that he had no idea the racial/ethnic makeup of his district until looking at its profile on the site. He was genuinely shocked. Time will tell if it impacts the way he votes, but it certainly won’t be for lack of information.
Step 9: Stay Updated
The total universe of data is constantly expanding and evolving, if you don’t stay up to date your interactive visualizations will quickly become irrelevant. We build methods for rapid update on the back-end of our visualizations. Sadly, this is often overlooked in the interest of getting something built quickly. The additional investment here pays off hugely down the line. Remember that this is especially true if you are making a large initial investment to launch your interactive visualization.
To make our Center for Jobs visualization maintainable, we connected an import module that allowed the data team to whip data up in Excel, add geography and date tag for each row, and upload straight into our system. It’s not necessarily fast — takes 1–4min per file per server — but it’s easy enough for each team to manage. We can update the site with the new monthly numbers in about 1.5 hours (vs. many multiple hours that would otherwise be required by a totally manual process).
With the nine steps framework, you’re well on your way to interactive data visualization greatness. This framework is one we’ve proven over multiple projects to meet a wide variety of client situations we encounter. As mentioned, it is not a recipe that should be followed precisely. In fact, some of the greatest creations from our shop have come from the small (or even large) deviations we take from this framework.
With that said, following this process we developed and launched CenterForJobs.org from the ground up in a matter of a few short months. The information there is easy to find, accessible, intuitively visual, and always available. The site has data for the state, 58 counties, 10 regions, 40 State Senate Districts, and 80 State Assembly Districts.
In time it takes to read this summary section, a user can visit Center For Jobs and tell you that 9.1% of Riverside County is unemployed, management, information, and financial services show strong wage growth in Siskiyou County, Health Care & Social Assistance is largest growing industry in the state over the last five years, and there are ~21,000 unemployed people in Assemblyman Chris Holden’s district.
This nine steps framework produces real results: the Center for Jobs is living proof.
We’d love to hear from you! Reach us here: