Visualizing Relationships in Survey Data
Using a simple dataset to highlight relationships that might otherwise go unnoticed.
As we prepared to attend a conference of social sector leaders from foundations, nonprofits, and consulting firms across the globe, we were inspired to see if we could help visualize the relationships among the people who would be in attendance. We imagined that this approach would evoke a sense of empathy and inclusion at an event where people are hoping to network and connect.
The conference organizers were kind enough to provide us with a list of attendees, along with a few datapoints for each. This qualitative data was nominal and discrete in nature: name, job title, organization, and geographic information. Even the historic information it contained was limited: either this would be the first year in attendance for the guest, or it wouldn’t.
Cleaning and Preparing
Data that was gathered through the conference registration process included a few optional questions which generated null values periodically throughout the dataset. We knew that displaying these values would ultimately be more distracting than helpful, so we chose to either avoid those datapoints altogether, or remove the null values from our final comparison.
Open-ended survey questions will invariably return similar, though not exact, responses. In many cases these can be grouped together in order to make better sense of the overarching pattern. One example from our dataset relates to the way attendees responded when asked to provide a job title. The responses included hundreds of unique descriptions, many quite similar, and in order to work with the data in an intuitive way, we elected to apply categories and group the responses.
For example, many of the attendees are the heads of communication at their organizations, but the way they describe themselves varies — Communications Director, Director of Communications, Head of Communications and Brand Strategy, Director of Marketing and Communications, and so on. These all represent the same general job type, so we took some time to find these kinds of consistencies throughout.
We also encountered this challenge when trying to make sense of the types of organizations that would be in attendance. Because this conference focuses on social sector communications, there are a wide variety of foundations, nonprofits, consultancies, and agencies represented. And even within foundations, there are distinctions that can be made — operating foundations, community foundations, and independent foundations. With a bit of research, we felt comfortable assigning organizations into these broader foundation types to better understand the overall picture of who was attending.
We are mindful that grouping data like this involves careful thought and inquiry to ensure the intent of the responses is accurately reflected from the original data with no misleading context. However, the act of creating classifications, arranging, and ordering data is the basis for information architecture. It is essential for developing a compelling visual presentation.
Process and Exploration of Data
Diving into our data exploration, we kept a basic approach and started by counting the number of survey participants within the existing categories and groups we defined.
We then began to explore the intersections and relationships between these categories by asking a whole lot of questions.
How many are first timers? Where are they from? What do they do? How many of this type are also of that type? And so on.
Asking questions about your data is a necessary step in uncovering patterns and narrative elements. We experimented by filtering the groups by various parent categories including location, job title, organization type, and others until insights about the attendees started to emerge. Once we uncovered consistent patterns, we were able to move on to considering data presentation methods.
Data Presentation Methods
We decided that the most interesting way to look at this data would be to show the relationships between people and the categories they use to identify themselves.
Some of our early exploration took shape in node-link form as a dendrogram. We imagined that an interesting way to view this information would be in the form of a composite view of all attendees while highlighting key patterns. What we liked about this was the complete transparency of all the data; however, while it generated an interesting shape, it wasn’t easy to make sense of it at a glance, and had no clear takeaway.
The Sankey Diagram
It became clear that displaying audiences as portions of the whole was the simplest and most straight-forward way of interpreting the data, and we thought a Sankey diagram would be an effective choice. Working with this new method, we produced a series of iterative diagrams to explore the various connections within the data, and uncovered additional insights that became apparent when we looked at the visual patterns.
We narrowed our focus to the five most insightful visualizations. Even with the information about job types, organization types, and geographic location grouped, these diagrams still allowed attendees to see themselves in the data and get a sense of where they fit.
The benefit of a Sankey diagram is that it also allows you to clearly see the distribution and relationships across multiple channels, and we felt this presentation was both visually compelling and easy to quickly comprehend. We knew that we could further enhance the overall presentation when we applied a design aesthetic, and that’s what we did next.
To enhance the Sankey Diagram that we used during exploration, a color theme was applied to the discrete vertical categories, using sequential hues along the color spectrum. By doing this, we were able to create a distinct yet harmonious distribution of information, and by choosing contrasting colors we were able to emphasize the distinction between two opposing groups (new and returning guests).
Since the diagram was meant to be a way of visualizing relationships, we added labeling, grouping titles, and supporting statistics to provide context. Here are two examples of the set we developed.
In the first diagram, you can see that the majority of attendees are coming from the United States, represent independent foundations, and are pretty evenly split between returning and new attendees. In the second, you see that the attendees from the US are also fairly evenly split between new and returning, and representatives from every organization type are coming from all over the country.
Sankey Diagrams in Other Scenarios
Presenting data like this can prove useful in any case where there is a distribution of volume across multiple categories. The flow of the diagram helps provide context as volumes diverge and converge after passing through grouping states. This allows viewers to follow a narrative based on their particular interests, and makes it easy to draw comparisons.
It’s also just visually interesting, and the importance of that shouldn’t be underestimated.