Visualizing (Data) Patterns

Denise Nguyen
15 min readNov 3, 2017

--

Project 3 and Reflections
MDES Communication Design Studio, Stacie Rohrbach
Carnegie Mellon, Fall 2017

THURSDAY November 2 | Project Introduction

Today, Stacie introduced our new project, which centers around creating interactive data visualizations for Pittsburgh through aural, temporal and visual means. The goal is to be able to take someone through dense data and highlight important information.

We reviewed several examples of data visualizations from William Playfair (who created the first pie chart in 1804) to Charles Minard (who explored layering of different content) to contemporary examples.

Some interesting thing I learned about cognitive loading is that after 7–8 categories/layers/buckets, you can’t really proceed differences.

Looking at all the difference topics we could work with (e.g. air/water quality, crime, education, food, housing, transportation, health/wellness): I am gravitating more towards crime and education. However, I will do a bit more exploration the coming week and hone more specifically into one topic.

A reminder to myself is to make the year of data collection more transparent since some of the data we saw today were collected in 2010.

An interesting and effective data visualization that I found is: peoplemov.in

Another one I found that utilizes social interaction and physical space is: https://youtu.be/gIMXeOjZxsA

TUESDAY November 7 | Reviewing Existing Data

I am interested in Crime and Education because they touch upon data around race, which is a topic that is meaningful to me.

I looked at the data sets provided and took some notes:

  1. School District Boundaries: I’m unclear about this data since there are only two columns. There are 45 schools listed.
  2. All Inclusive: There are 95 neighborhoods around Pittsburgh. I need to understand what “margin of error” means in this context.
  3. African American: I’m wondering why the datasets choose to use the term African American rather than Black.

Other Notes:

I read this site and to get a better idea about why the data is used and what questions a data visualization might help draw attention to.

  • School enrollment data are used to assess the socioeconomic condition of school-age children
  • Government agencies also require these data for funding allocations and program planning and implementation.
  • People were classified as enrolled in school if they were attending a public or private school or college at any time during the 3 months prior to the time of interview.
  • The question included instructions to “include only nursery or preschool, kindergarten, elementary school, home school, and schooling which leads to a high school diploma, or a college degree.”
  • Respondents who did not answer the enrollment question were assigned the enrollment status and type of school of a person with the same age, sex, race, and Hispanic or Latino origin whose residence was in the same or nearby area.
  • School enrollment is only recorded if the schooling advances a person toward an elementary school certificate, a high school diploma, or a college, university, or professional school (such as law or medicine) degree.
  • Tutoring or correspondence schools are included if credit can be obtained from a public or private school or college.
  • People enrolled in “vocational, technical, or business school” such as post secondary vocational, trade, hospital school, and on job training were not reported as enrolled in school.
  • Field interviewers were instructed to classify individuals who were home schooled as enrolled in private school. The guide sent out with the mail questionnaire includes instructions for how to classify home schoolers.
  • Enrolled in Public and Private School — Includes people who attended school in the reference period and indicated they were enrolled by marking one of the questionnaire categories for “public school, public college,” or “private school, private college, home school.”
  • The instruction guide defines a public school as “any school or college controlled and supported primarily by a local, county, state, or federal government.” Private schools are defined as schools supported and controlled primarily by religious organizations or other private groups. Home schools are defined as “parental-guided education outside of public or private school for grades 1–12.” Respondents who marked both the “public” and “private” boxes are edited to the first entry, “public.”

Stacie discussed Yau (who was focused more on ways of organizing data) and Wurman (who was focused more on ways of organizing info). We also learned more about different:

Coordinate Systems

  • Cartesian: Two sets of data
  • Polar: Parts to Whole, not a finite set
  • Geographic: Location

Scale

  • Linear/Hierarchy (0,1,2,3): There’s an order
  • Categorical/ Category (e.g. types of chocolate)
  • Percent
  • Logarithmic (0. 100, 100, 1000)
  • Ordinal/Hierarchy (e.g. good, bad, terrible)
  • Time
  • Alphabetical

Wurman uses the LATCH method to organize information, but Stacie would adds parts to whole. She said to keep in mind the scale and tradeoff between:

Literal — — — — — — — → Abstract

Less Accurate — — — — → More Accurate

In class, we formed groups around our topics; mine was education. I came up with a question around the topic, which was:

How might poverty rates of certain neighborhoods affect education enrollment rates and crime rates (by race).

For my navigation flow, I want to show the poverty rates by neighborhood, then the education enrollment and then the crime rates.

Tuesday November 14 | Getting everyone on the same spot

  1. What interests you with this project? What is the content you’re interested in looking at and why? I’m interested in looking at education because it relates a lot to race and crime which I am interested in. One of the things I find the most unfair about life is that education is such a good catalyst for a good life and some people are afford a good education because of contextual circumstances.
  2. What types of data enables you to explore your interests? The data I need are on neighborhoods, education enrollment income, race, and possibly, the income of the family and the enrollment rates of the children.
  3. What is your project question? What do you want to learn? My project question is: How might poverty rates of certain neighborhoods affect education enrollment rates and crime rates (by race).
  4. Describes Steps of Narration. First, I will look up information about the total population of the neighborhood. Then, I will look at the amount of population under poverty. Then, I will look at the the population currently enrolled in school

Then I added data about crime rates to see if there is any correlation:

Then, I added on the population of blacks (which the data labels as ‘African American’) in each neighborhood.

I just realized something: populations who are not enrolled are not be because they dropped out. It could be because they are older. I can’t make that assumption so I’ll go back and clarify that.

WEDNESDAY November 15 | Homework

I decided to look through the data again to push the data further. I scanned the food, health, and crime data again.

I found data on farmer’s markets and fast food places. In my own experiences, I have found correlations between farmer’s market and wealthier more gentrified neighborhood and fast food restaurants and lower income neighborhood. I wanted to create a data viz that might support or deter that assumption.

My initial questions is: How might poverty rates of certain neighborhoods affect education enrollment rates and crime rates (by race).

But I want to layer it with: How might the most predominant racial makeup of a neighborhood correlate with the poverty rates of that neighborhood and the location of fast food and farmer’s markets and obesity rates. Does that affect education enrollment rates and crime rates of the neighborhood.

As mentioned earlier: I realized populations who are not enrolled are not be because they dropped out. It could be because they are older. I can’t make that assumption so I’ll go back and clarify that.

TL;DR

My new question is:

How might the most predominant racial makeup of a neighborhood correlate with the poverty rates, the location of fast food and farmer’s markets, obesity rates, % of education enrollment rates*, and crime rates of the neighborhood**.

* Lower education enrollment rates doesn’t mean that people dropped out, it could be because of a older neighboorhood

** Crime rates can be skewed because of racial

Latch

I’m going to order my information by category overlays (predominant racial makeup, then poverty, then fast food locations, then farmer’s market locations, then obesity rates, then education enrollment percentage of each neighborhood, and then crime locations.

Scale

My scale will be linear.

Range

  • Predominant Race: Categorical
  • Poverty Rates: 0–100%
  • # Fast Food Locations in each Neighborhood: 0–838
  • # Farmer’s Market Locations in each Neighborhood: 0–54
  • Obesity Rate per Neighborhood: 0–100%
  • Education Enrollment Rate per Neighborhood: 0–100%
  • # of Crime: 0–13532

Bucket

  • Predominant Race: Blacks, Whites, Asians, Latinos, Etc.
  • Poverty Rates: 0–10%, 11–20%, 21–30, 31–40, 41–50, 51–60, 61–70, 71–80, 81–90, 91–100%
  • # Fast Food Locations in each Neighborhood: 0–100, 101–200…801–900
  • # Farmer’s Market Locations in each Neighborhood: 0–10,11–20…51–60
  • Obesity Rate per Neighborhood: 0–10%, 11–20%, 21–30, 31–40, 41–50, 51–60, 61–70, 71–80, 81–90, 91–100%
  • Education Enrollment Rate per Neighborhood: 0–10%, 11–20%, 21–30, 31–40, 41–50, 51–60, 61–70, 71–80, 81–90, 91–100%
  • # of Crime: 0–1000, 1001–2000,….13001–14000

Coordinate

I am going to graph my data geographically by Pittsburgh neighborhood with an overlay of the crime incidents, fast food locations, and farmer’s market location.

Misc Notes and Questions

I just found data for Education Attained. I need to look into overlaps between that data and the Education Enrollment in class tomorrow.

I need to map the crimes with the X,Y coordinates on a scatter plot and overlap that with the neighborhood boundary map.

I need to see if my ranges need to be in similar increments (as scale of 1–10 versus 1–1000….can they exist in the same data viz)

I need to ask Stacie if I have too many layers and things I’m trying to show relationships between. At the end of the day, I think I’m trying to show if the racial make up of a neighborhood affects poverty rates (which could affect education level and # of crimes) and location of fast food places/farmer’s markets (which could effect obesity levels).

THURSDAY November 16 | Starting to Visualize

We started attempting to visualize our data.

We took a stab at different ways of visualizing our data.

TUESDAY November 21 | Speed Dating

I used tracing paper to show my flow a bit.

Today we did a speeding dating critique and it was really helpful.

I think the biggest problem that I have is scoping it down a bit. Here is some feedback I relieved from some of my classmates.

  • The overlap of information seems a bit cluttered.
  • Don’t show everything at one glance.
  • The race map seems very colorful now but it will actually be more muted since the predominant races are Blacks and Whites.
  • Narrow does to the most important thing you want to show and see what data you have that best fits that.
  • One interesting route is the correlation between the major race of a neighborhood and the income, education, and crime label.
  • Perhaps do away with the map since the location isn’t really help if my main point is most prominent race and not so much the racial markup of a neighborhood.
  • Since I already know that the most prominent races are Blacks and Whites, I could bucket everyone else as other. That way, I will still remain inclusive but it will declutter it a bit.
  • Perhaps, I could combine race, income, and education into social economical to have one thing that captures both visually.
  • And interesting direction is to focus on whether the crime locations affect where fast food places or farmers market are places.

THURSDAY November 28 | Work Day and Check in with Stacie

Even though my classmates wanted me to trim it down a bit to two to three topics, I still want to hit upon more through the lens of race. Because of this, I think starting off with a race as an entry point is still good.

Instead of showing all the races, I narrowed it down to three main buckets: Black, White, and Others.

This is the visualization I came up with:

Here are some critiques I got from Stacie:

  • I am utilizing color too much, perhaps I can play with other visual styles like line shape, weight, color, texture, contrast and position.
  • Perhaps, I could just use gray scale for all the races and then the color wouldn’t be so cluttered.
  • Think more about “what is present and why?”
  • I could leave race at the highest level and then narrow down to crime, fast food, and farmer’s market locations.
  • Perhaps starting with the map, I can just use shapes for the neighborhoods. However, I need to go back to the map/actual boundary of the neighborhood because my data about crime, fast food, and farmer’s market are location specific and need a map.

THURSDAY November 30 | Work Day

Here are some questions Stacie told us to think about:

  • What does the viewer need to see at any given time?
  • Should you compare to one neighborhood to another or all?
  • Can I turn some information off ? Should some things be more highlighted?
  • What tools do they need? What options do you need to provide them? How can the tools be more integrated with the interaction?
  • When looking at hierarchy, should some tools recede? Maybe expand and collapse certain tools?
  • Think about tone and mood (e.g. crime would have red, black, or high contrast colors).
  • Can the typeface and visuals communicate what you are talking about even if the viewer doesn’t read it.
  • Play with organic versus abstract shapes.

Today, I played around with different ways of visualizing the entry point again. Per Stacie’s suggestion, I did away with the map and experimented with more abstract shapes.

The I wanted the center to be used for the % of enrollment.

Then I had an idea to go back into the map view once the user digs deeper and clicks on one of the pie charts.

TUESDAY December 5 | Work Day

I was a bit stuck because I had a lot of data and didn’t know how to progress. Stacie said it seemed like I had two directions for my data visualizations, so I should just focus on income, education, and crime instead of fast food and farmer’s market.

She also questions why I made the color blue represent Blacks instead of black; I’m increasing the cognitive load. I was afraid of perpetuating stereotypes, but Stacie said that I’m not because I will have the color blue to represent ‘Others,’ which shows that I had some sensitivity around the topic.

THURSDAY December 7 | Walk Thru

Today, we had a walk-thru of everyone’s click thru.

Here are the comments:

  • How do people select two neighborhoods to compare?
  • Is there a way to see multiple laters at once?
  • Should your filters align with the order of your questions (e.g. crime last)?
  • How did you choose the neighborhoods?
  • There is enough but maybe add sub-area (all south of the allegheny).
  • I like the overall aesthetic, but I like to compare info and layers to see relationships.
  • Moving from one scene to another seems not connected, so I can not compare (compare race with crime)
  • Nice visuals, but need to integrate the different elements somehow, rather than looking at each factor individually.
  • Can I select two neighborhoods from the home page.
  • It would be nice to select from list and layer all the data sets.
  • Handcuff symbol is overboard
  • What is the ordering schema for the neighborhood?
  • Comparison view is redundantly labeled.
  • Find ways to layer because all info is seperate.
  • The lightest color for median income is hard to read.
  • Specify max and min on scale.
  • Race looks like paint cards
  • Maybe you could use scale of the handcuffs to represent the #s of arrests.
  • I can’t see where I can make comparisons between the neighborhoods.
  • I would like to compare more than two. Provide more connections between 4 main topics.
  • How can I jump back and forth to relate info?
  • You can try changing the order to match or over arching point. race, crime, income, and education seem like separate entities
  • Allow people to walk thru before comparing.
  • More values since colors are hard to compare
  • How can you propose a view that shows the correlations? (e.g. terrance village is very low income, why could that be? That’s the opposite of east liberty?
  • It would be nice to see education change in scale, it’s hard to see overall difference in colors.

Individual notes from Stacie:

  • “I think so, but I don’t understand how the neighborhoods were determined. May a precursor screen or two that introduces the neighborhoods” <- could is be based on areas with higher crime statistically.
  • “It’s difficult to determine where arrests, income, and degree lines on scales without seeing the whole scale. I do see if Shady side, East Liberty are the same/different.”

Notes from Stacie:

  • Introduction to the piece
  • Clear communication on what buttons mean. Hover state for more context
  • Pose retorical questions. How can you end?
  • Make sure the nav doesn’t overpower representations of content integrate into piece.
  • Point out what you want people to see

TUESDAY December 7 — 13 | Final Week

After the feedback from my classmates, I sat down and mapped a better user flow. I realized that I had two portions to my data visualization:

  1. one where I am showing all the neighborhoods at once
  2. one where I am comparing neighborhoods

From this I decided to add a compare on the navigation to allow the user to enter the compare mode anytime that they wanted.

I looked again at the data to make sure that I had everything accurate.

Here are my final screens:

Key Takeaways

My data showed that there was a correlation between the crime and the racial makeup of a neighborhood, median income, and level of education of a neighborhood.

Take a look at the top four highest and lowest income neighborhoods:

Neighborhoods with more Blacks, lower education and smaller median incomes had more arrests. While neighborhoods with more Whites, higher education, and a higher median income had less arrests. The one outlier is Terrance Village because it has the lowest income, but also very little crime. That might be an interesting data set to look at.

I wanted to work with data around race because it is trickier. The ways to visualize race is very hairy, but I think as designers, we should take on these challenges. We should also question where our data comes from and why they were created and for what purpose. And what about missing data sets? I was afraid that my crime data could be skewed due to racial profiling.

After presenting, some suggestions were to keep the percentage constant for arrests as well (instead of using a count) and to have break downs of types of crime. There were also good discussions around income inequality being a high factor for crime rates.

During this project, I learned more about the designers role in data visualization and how it can aid with bringing forward patterns in data. That’s a huge responsibility and we need to constantly question the data. I also learned how to layer data. That’s something I will continue to work on because I enjoy data visualizations. Working with data is very soothing for me.

--

--