Project 3: Data visualization, making data meaningful

Evaluating Education in Pittsburgh

Published in

CMU Design School Master’s portfolio

13 min readNov 13, 2018

We’ve all seen data visualizations — some of them effective; many of them simply interesting graphics. Despite common communication problems that arise in their delivery of information, data visualizations have become a prevalent form of conveying ‘facts’. In this project, we’re going to work to illuminate connections among data in ways that help viewers recognize, engage in, and think critically about their important relationships. Thus, we will strive to craft visual, aural, and temporal representations of data that render the information useful, usable, and desirable.
Designers are frequently asked to sift through large amounts of seemingly disparate information to identify relationships (inward communication). They are also often tasked with representing their discoveries in a clear, concise, and engaging manner that reveals patterns and invites participation (outward communication). Therefore, in this project we will once again immerse ourselves in the learning and application of communication strategies that help us make sense of dense information and aid the communication of our discoveries to others.

Week 1 — Digging through the data (11.8.18)

Focus

I’ve chosen to look for the correlation between a neighborhood’s distance from schools and the education and career impact it has on a student’s future.

Research question(s)

What impact does one’s proximity to well-rated schools and familial income have on a child’s future income and social mobility?

Data pardigms to utilize

While the following are many lists, the lists will help organize our ideas.

Wurthman–LATCH:

Location
Alphabet
Time
Category
Hierarchy

Yau–LCP-LOT:

Linear: (123, evenly spaced)
Categorical: (Shadyside, Squirrel hill, etc)
Percent: (amount of the whole)
Logarithmic: (0, 10, 100, 1000)
Ordinal: (good/bad…descriptive scale)
Time

Data framing:

Cartesian
Polar
Geographical
Coordinate

Visualization ideas I’m considering

Use geographic sorting
Interactive and show movement of students
Comparison between students end income and parents
Comparison between students by school they attended

Questions to consider

How much data is just shown vs uncovered?
How could motion add to the data story ?
How can data be layered in order to make sense of the whole instead of just the discrete pieces?
Are there data analysis tools that I can utilize to tease out interesting ideas?

Visualization research

Below are just a few static data visualizations I found:

Week 2 — Collecting and cleaning the data (11.15.18)

Finding data

While looking for data I found that my data needed to be retrieved from:

Pittsburgh/Allegheny County Data sets
U.S. Census Bureay
American Community Survey (ACS)
National school registry and ranking (in collaboration with Niche)
Social Mobility Scores — Economic Forum & Rand Institute

Obstacles

Data and year alignment
Data zip code/tract matching
Creating meaningful predictions (still a struggle) — specifically for student future earnings
Learning the correct formulas/calculations
Developing compiled data sets

Making sense of how to organize the data

So, I began organizing the data using Stacie’s method of starting with the known and needed data sets (green post-its) and working out using categories and representations, from the literature, to think about how to organize and collate the data.

Organizing my data sets and thinking about visualizations

Data entry point

After this activity, I was considering how could I create the correct entry point to engage with this data. There seems to be three options depending on what I want focus on geographical, polar, or a ranked cartesian. I’ll need to play with possible visualizations to see what works and resonates with others.

Continued Data cleaning

So, close. I am now thinking that perhaps I should have chosen a simpler topic. However, My goals for the project are:

Create a data projection that moves beyond just a causal or correlation data study
Create a predictive algorithm
Make the data interactive or tangible.

Week 3 — Data manipulation & early concepts (11.23.18)

Stuck

This week, I am still a little stuck on how to get my data properly cleaned in order to work on the visualiztion. I under estimated the amount of work to collect, clean, and develop statistically sound data predictions.

I’m debating if I spend another day or two trying to get the data where I need it (close yet far…knowing that I need to start visualizing) or just jump into a more surface level analysis and focus on the visualizations.

Data physicalization

As I have been working on the topic and with the data, I am beginning to think that perhaps data phsycalization may be a better way to represent this data nd allow others to physically interact with and view the data. Below are a few explorations beyond this nice collection and this website.

My explorations

What is the difference between data and information?

(left): Polar comparison of student potential by neighborhood; (right): Exploration into tangible data by shading and money potential eanrings. I explored both using money as a graph or assigning “face value” to each neighborhood according to key data variables.

(above): Looking at a digital idea for organizing the data of neighbor hoods and student potential. Colors would be used (left) to show parental income, educational quality, and employment rate as a combined statistical score, After clicking on a neighborhood or start button, then the neighborhoods would move to a cartesian plan plotting student projected income and educational attainment.

Feedback

A physical model is quite compelling and is worth exploring
A physical model forces a tangible experience that increases memory and understanding.
Scrap the idea of a comparison to money because that camparison is too strong
Cartesian plot is very interesting but needs a map or indexical way to locate information the user is looking for
The Cartesian idea may lend itself well to comparing parental levels and student forecasted levels by overlaying them and showing their connection via color or line
For any of the models I need to make the connection between the parental social mobility and student projected mobility CLEAR
Move beyond using a varibable more than once, specifically position for me

New physical model of data idea

In this idea, I am looking at the difference between foundations and future projections. Each neighborhood would have two levels. The bottom level would be the collection of the the parental social mobility scores (income, education level, quality of schools). The top level would be the student projected mobility scores (education, graduation rate, projected income, and next post-highschool destination–college, trade school, work,etc).

5-minute interim presentation goals

What question(s) are you exploring
Data sets being used to explore those questions? and why? (Affordances)
What is the anchor/coordinate systems do you propose using? and why?
What types of information are you using (LATCH)?
What scales (LCP-LOT) and range, of data, do you propose using? **and process to walk through
What structure am I using, linear and indexical? and why? (simultaneous viewing vs linear progression/layer) Also, what are the interactions and what is avaialble at any given time?
What visual, auraul, temporal, or tactile do you propose using? and why?
7 slides and one simple representation. Make the presentation to be prepared for critical feedback. In medium post, show explorations.

After presenting my idea of making a physical data visualization. Most of the critique centered around how to clearly show the data and layer the information.

Moving forward, I decided to pusue a physical represenation. thus, began exploring how to layer the important information.

What I’ve looked into the layering of ideas and a few questions I need to test are:

How do individuals read models of physical data
How much is too much and too little data
What are the most important factors to surface to a reader
Who is my audience? — City residents and community social decison makers

Week 4 — Data mining & testing readability concepts (11.28.18)

Well, this week yielded little progress due to the holiday. Except that all my data is correlated in one spot and now can link the 10 discrete data sets I am using. This is a huge break through in order to do the analysis and and predictions.

I also confirmed the need to focus on simplifying the data on the physical model and that the having a pyramidal shape for the stacks indicates importance or relevance; which is not what I am going for. So, the data will be in single column that can be deconstructed for more info.

Week 5 — Prototypes (12.3.18)

Physical form

To create this physical model, I decided to go with laser cut pieces to show the social mobility thorugh visual height and tactile engagement.

Early form idea for stacking information

In order to find the best way to display the information, I began by exploring various forms and ideas with paper and foam core prototypes.

This led me to determine the form and size that I would need to make a compelling model.

Building the model for cutting in Illustrator…so many lines and hours

Due to the cost, and time to create the entire Allegheny County social mobility model, I decided to fully prototype out 6 neighborhoods and then hint at the rest. by including the various districts on the final model.

Material

An easy first choice of materials was wood, because of the flexibility and rigidity it provides to create such a model.

Next, I landed on using colored acrylic to create a more clear indexical structure to my data. While also, providing the layers of information I need to be seen quickly from near or far.

Color

After several explorations, I decided to stick with simple primary or secondary colors in accordance to how economists and social scintists weight the importance of various factors within the components of the social mobility score. Thus, red, yellow,and green. Red are attributes/components that are weighted least heavily. Yellow are of medium importance, and green have the greatest importance.

Assembly

As I explored how to assemble the form I began looking at how the data could be stacked meaningingully with encoded data at each layer.

This meant that I had to consider the size of various pieces that could be grasped or visually distinguished through color, size, and height. I also wanted the experience to be tactile. So, planning for readability and tangibility was important.

Through this exploration, I decided that my model would be a tabletop model at a city hall or community center. Thus, I wanted 3 levels of scale. first, the board and geographic model to catch attention. Next, the “hill” aggregated hill data, and lastly the more meta data on each slice of the puzzle.

I utilized a aquare peg to connect the data and prevent rotation and maintain stackability.

Week 6 — Creating the model (12.10.18)

Creation

Creation quickly became much more complicated than initially planned with several hard restarts to improve the model appearance and adapting to the data slicing/chunking that took place.

This physical creation was one of the first where I ran into significant manufacturing issues such as:

Large file size
Laser rouge-ness
Laser mirror clouding
Time

However, once all the major issues were resolved, the rest of the model came together quickly.

(left) Top view; (center) side view; (right) at level zero

Pulling one piece out of the map and comparing student vs parent

The data stacking up is the student data and the data hanging below is the parental/adult data and represents the social mobility of each group.

When pieces are pulled out of the board, then comparison between neighborhoods can occur but also parent vs student.

Once the neighborhood pieces are removed, individuals can “peel” back the layers to look at and understand more of the meta data my parent and student. The student projections are set for seven years out from a typical highschool graduation age of eighteen.

Individual slices encode the data for students and parents. Students have the single person icon and has the meta data in all lowercase. While the parental/adult data has tow figures and all theinformation is in all caps. This helps clearly pari two discrete pieces of data and information.

This adds a level of remembrance and tactile memory in hopes to prolong the understanding of and engagement with the meaning behind what is being presented.

Final presentation

December 13, 2018: I presented the final data visualization concept to our professor Stacie and the class. I took the approach of allowing individuals discover the data and interact with it. I felt that the story the data should tell could be self-generated after being given the space to discover an explore. The data is reflecting on parents current state and forecasts out seven years for someone who is eighteen years old.

I envisioned that an installation like this would exist in a public building to prompt public and employee engagment. The interactivity of this tool allows new conversations and questions to occur.

Me Explaining the data visualization and comparing two different neighborhoods

Due to cost (colored acrylic is quite expensive) and time constraints, I modeled out the three highest neighborhoods’ and three lowest neighborhoods’ social mobility scores.

Two different neighborhoods illustrating the parent vs student information. Both sets of data were made for all six neighborhoods protoyped.

Because the social mobility scores of students and parents are so close–most within 0.5–1.5% difference, I leveraged the use of the most important factors by engraving them in acrylic.

The data was broken up into similar data for both parents and students/individuals based on weighted scores and linear divisions.

I coded the data on each piece by using uniform icons and typography treatment to reinforce the data and create a cohesive mental model around how to read the data.

I also stuck to a geographic map because many things like employments, school districts, etc cross neighborhood boundaries but influence the people who live within them.

The height of each stack is representative of of the relative social mobility score.

I observed that as individuals interacted with the data pieces they gained a deeper connection and this prompted more questions and interesting ideas to explore in future iterations.

Overall, the concept came across successfully and provided delight throught he puzzle like nature of the visualization and the two types of data shown above and below the main board.

Key feedback

The data makes sense and is instantly understandable just by viewingthe neighborhoods.
What is a story that could be tied to the data to make the data actionable?
How might this map have even more data mapped onto it?
Though, I chose the order of the colors based on feedback in order to highlight green, perhaps considering other ways of rearranging the colors and how does that change readability?

Key Challenges

Cost of materials
Time to laser cut
Unpredictability of etching outcomes
Having enough data to calculate a realistic social mobility score
the narrowness of the data outputs (I need to take even more data courses)

Reflection and further questions

The height of the social mobility scores are often inverse of the topographical elevation of various neighborhoods.
I need to learn how to manipulate and model more data
Perhaps changing the thickness of acrylic could help demonstrate the impact of weight of different components more clearly; especially if I were to expand this to incorporate more data points.
How can I highlight the importance and differentiating factors in the 1% change?
Why do parental and student social mobility remain similar?

Final social mobility map

Updated pictures coming soon…