Project 03: Visualizing Patterns
Data Visualization | Individual Project | 11/01 ~ 12/13
Is there any relationship between time, bike crash type, and location?
How is the trend of bike crash look like throughout a year?
Where is the most/least frequent bike crash area according to the bike crash type?
Where is the most/least frequent bike crash area according to the time of a day?
Week 11 — Data & Pathways (11.08.2018)
The two readings of Wurman and Yau was helpful to understand the basics of data visualization. Especially, since I haven’t done any project about data visualization before, I did not know how can designers convey messages through visualizing data and why it is important.
“What questions are the makers saying?”
Data visualization starts with the questions, which means it contains specific intention. A variety of visualizations could be possible based on the same data sets, and they could provide different feelings or thoughts according to how to use and interpret the data.
“How is the information organized?”
The ways of organizing information are finite — LATCH (Location, Alphabet, Time, Category, Hierarchy). All of the ways of organizing data can be categorized into these five groups and understanding the structure and organization of information permits me to extract value and significance from it. In order to choose the most appropriate method for my intention, I should understand the effectiveness of each of them.
The data do not change, but the information about them does. Each way that I organize information creates new information and new understanding.
“What forms are used to represent the data?”
Various visualization components could be used for visualizing data.
Visual Cues — position, length, angle, direction, shapes, area, volume, color saturation, color hue
Coordinate Systems — cartesian, polar, geographic
Scales — linear, logarithmic, categorical, ordinal, percent, time
Context — examples about pens and inks of designers Matt Robinson and Tom Wrigglesworth, fingerprint traces of designer George Kokkinidls
It was helpful for me to look through the good examples of data visualization works. Quickly enjoying and analyzing them together, I could understand them more deeply. I could summarize several important points that I found like this.
- William Playfair: The first person to use visual forms to represent the data in a better way.
- Concrete shapes VS Abstract shapes: Usually, it is more common to see the works using geometric shapes which are abstract and simple. The reason for this might be the abstract shapes does not contain any specific meanings in the shapes themselves so that it is more useful to represent the data itself. However, using concrete shapes, such as maps or any kinds of stuff that are familiar with us, is also a great way to visualize data. Because we already know their features, and this makes us more understandable when we see the visualization. The most appropriate example was the works of the German designer.
- Colors: Using color is important. One interesting point that I found through the examples was about using related knowledge or information about the colors. For example, we can represent males with blue colors and females with red colors using gender stereotypes. (It might be controversial to say that which color is more suitable to which genders.) However, it could remind me of how to use related information about colors. According to the choice of colors, the same visualization could provide positive or negative feelings or strong or mild messages.
- Interactiveness: Whether the visualization has interactive features is also matters. As a user’s perspective, when the visualization changes or represents different results according to his/her control, the visualization could have more powerful effects. It feels like exploring the data and this behavior makes users more active. However, giving too much freedom to users can make the message that I want to tell through the work unclear. Thus, it is crucial to think about how much freedom do I want to give to users and how much do I want to give to myself.
- Comparison: Making it easy to compare the data is important. Thinking about how to maximize the difference or contrast between the data sets could be helpful.
- Story-telling: As I explored above, data visualization is not just about telling the numbers of facts. It is more about how to selectively show them to the users or audiences in order to support what I want to say. Thus, thinking about the whole structure and how to combine the visualizations into one integrated story could be important. Also, story-telling is helpful to set up the path of how users read the content.
#03 Exploring Data Sets and Breaking Down Questions
After that, we start to share the questions we made while exploring data sets. It was helpful to think about how to break down the questions into pieces. What I have thought about until now was these two questions;
- How the types of occupations or work activeness can be related to obesity rates? — There are various occupations and I thought that they could be divided into three categories(keep sitting, standing, and moving) in terms of physical activeness. There might be correlations between them. At the first time, I found interesting to think about the relationship between people’s healthiness and the accessibility to the fast-food stores as well. However, Stacie brought up this idea as an example and I tried to find other factors.
- How waste management can be related to the fire incidents? — There were a variety of reasons for fire incidents and I focused on ‘Outside rubbish, trash or waste fire’. They are one of the main causes of fire incidents and I wanted to dig further.
things we have to do until next cclass week
12 — Visualizing (11.13.2018)
- Determine Data Sets: I decided to focus on crash accident data of Allegheny. Since it has a wide range of data including years from 2004 to 2017, months, the day of the week, time, latitude and longitude. In addition, I could also get data sets about the type of crash and the type of intersection. Since general crash accident data is too large to deal with, I tried to narrow down to a specific unit, such as a bicycle or motorcycle. Among them, I became interested in bicycle accidents.
- Write Questions That Your Project Will Raise/Investigate: While I was looking at the bicycle accident dataset of 2016 and 2017 in detail, I found several interesting relationships. First of all, it was really evident that most bicycle accidents happened in a specific municipality (2301). I wondered why most bicycle crashes happened in that area. Also, I tried to make a relationship between the seasons and the time(dawn, morning, afternoon, and evening). It was quite interesting to see that the bicycle crash increased during summer and fall and decreased during winter and spring. Furthermore, most of the crash which happened in the morning and night was in the period of summer and fall. Maybe it is because people usually ride a bike to their work in the summer and fall. Thus, my questions were “What are the relationships between a bicycle crash and the municipality of 2301?”, “There seems like a relationship between the seasons(or maybe weather) and the time(hours). Why morning and night bike crash is high in the summer and fall?”
- Diagram Sequence/Narrative That You Envision Your Present Representing: It is my first time to conduct a data visualization project so that I don’t know much about the tools. However, I wanted to try an interactive data visualization instead of static forms. I asked about tools of data visualization to Adrian, one of second-year MDes student who specializes in data visualization, and he introduced me several tools; ArcGIS, Kepler.gl, Deck.gl and Google Data Studio. I took some time to explore those tools and also others.
- Do You See A Coordinate System Beginning To Emerge?: Hmm… I think so. In order to visualize the location of crashes, the geographic coordinate system will be appropriate. I can get the data about latitude and longitude of each crash, it might be possible to draw on the map. Also, to visualize the relationship between the seasons and the time, I could use cartesian or polar, or both of them. The polar coordinate system looked more useful since it is easier to map the seasons and the time on the round shape.
- What Organization Methods Do You Imagine Leveraging In The Data? (Wurman: LATCH): Location will be used to visualize the location of the crash. Time criteria will also be necessary since I am trying to find a relationship between the two different scopes of times — the seasons and the time of each day. Category and hierarchy are already used to categorize the seasons and the time into groups. For example, I divided 12 months into 4 seasons — Winter(12, 1, 2), Spring(3, 4, 5), Summer(6, 7, 8), Fall(9, 10, 11). Also, I divided 24hours into 4 groups — Morning(06:00~12:00), Afternoon(13:00~18:00), Evening(18:00~24:00), Dawn(00:00~06:00).
I tried to find free data visualization tools that I can use for this project. Matt recommended me Mapbox.com, and I tried to upload the latitude and longitude data of bike crashes on the platform. Even though I should go through much more whether I could design the details of visualizations or not, at first glance, it seemed simple and good to use for this project. Since the crash data will be visualized on the map including seasons and time parameters, I could categorize them into different colors, blurriness, or opacity.
#02 Class Reflection
In the class, we tried to make our idea concrete by articulating questions on the stickies. Also, we analyzed and integrated Yau’s Scales and Wurman’s LATCH into one. These process helped me to think about how to divide and utilize the existing data according to my perspective. It made my idea clear and helped me figure out about the details. For example, I came to think about how many years that I am going to compare. Every year? Every 5 years? Or just one year? Maybe I could decide after looking through the relationships of data sets more deeply.
After that, we took some time to think about what form could be appropriate to visualize our data. As I explored in the previous ‘Make the Abstract Concrete’ project, I should decide the shape, color, motion(if interactive), sound, and other details.
Week 12 — Entry Points & Layering (11.15.2018)
#01 Reading Summary — Experience Design (Nathan Shredoff)
When designing experiences, it is important to take care of an engagement and a conclusion. The engagement is the experience itself. The conclusion should be designed since participants will be dissatisfied or even confused without it. All experiences must reward attention at their end.
The work of experience design competes not only in the same medium but also others. Each medium has their own characteristics and it is crucial to be unique to its medium and compete with traditional media in usefulness and satisfaction.
- Cognitive Model: Everyone forms cognitive models for nearly everything they encounter. Repeating, recounting, or duplicating experience is a good way to form cognitive models that we want to provide to the participants. It is important to create other ways of moving through the experience that allows others to form a mental map in a way that better suits them.
- Metaphor: One way to build a cognitive model. It is worth noting in the context of design. Some metaphors are world-widely used, but some are not.
- Visualization: Understanding the features of each list and diagram and using it in an appropriate way is important. Layering is one of the options to visualize overlapped multiple data sets, and the scales and coordinates must be consistent and relative. Good visualizations pay special attention to scale(relative or absolute), orientation, view, projection, detail, generalization, and layers. Using three-dimensional (3D) forms to ‘fancify’ is an inappropriate use of a technique. The more dimensions, the more potential for clutter, but with careful consideration, some diagrams can be remarkably clear while displaying 3D data sets.
- Interactivity: Making a place for audiences to take part in the action. Most interactive experiences in our lives have nothing to do with technology, such as conversation. Interactivity is not necessarily better, but it usually does correspond with higher involvement by an audience.
- Meaning: Making connections to participants’ own lives and values. Physical objects from the experience serve can remind them of their experiences. These experiences have the most success when they have the most meaning for them. It is difficult to touch all people on a personal level since everyone’s context is so varied and intimate. Thus, focusing on specific audiences will be helpful.
- Feedback: Most people expect experiences to acknowledge their actions in some way. Giving just the right amount of feedback is important. In general, people expect to be treated as they are treated by others, and expect to interact with systems in the same way they interact with people.
- Adaptivity: Experience that seems to adapt to participants’ interests and behaviors always feel more sophisticated and personal. Customization VS Personalization. It is important to understand which attributes will make an experience better and maintain balance. Since we are accustomed to adapting behavior from people, it is natural to expect systems to respond in kind.
- Control: Some degree of control make participants feel more comfortable and respected.
- Participation: Making the experience more meaningful. Involvement.
#02 Data Analysis
I analyzed the data sets of bicycle crash from 2013 to 2017, the most 5 recent years and found several interesting trends.
- It is very clear that most bike accidents happened in the specific municipality(2301).
- It is evident that the number and the ratio of bike crash were the highest in summer in the all 5 years. The figures for spring and fall were similar, while those of winter was the lowest.
- It is very noticeable that bike accidents mostly happened on weekdays rather than weekends. Also, the ratio of bike crash on weekends was the highest in the fall, especially in 2013.
- The ratio of bike accident which happened in the evening and night was the highest in the fall in all 5 years. Also, the number of bike crash in the morning increased throughout 5 years. The reason might be the increment of the number of people who ride bikes to commute.
- In the recent 3 years, the total number of bike crashes decreased.
#03 Class Reflection
- Pathways: Design the paths that I want to let my viewers go through. I can design the visualization interactive or containing a story. Borrowing the terms from Shredoff, there should be a concrete engagement and conclusion to provide better experiences.
- Buckets: Looking for buckets or groups of pieces of my data. I already tried to think about this — season, weekdays/weekends, morning/afternoon/evening/dawn can be buckets. It could make my data to be easy to compare and understand.
- Selective Attention: Which one is could it be, and which one is could it not be. For example, if I use colors for everything, it won’t work. In this perspective, I should select one coordinate system(Cartesian, polar, geographic), instead of using several at the same time.
- Form Variables: location, placement, shape, color, texture, proximity, size → how can I use them as layers to visualize contents in a better way? Colors and opacity: For seasons, it is natural to remind specific color, which I used. For a time, I could change the opacity of each season color, instead of using the range of black to white.
I decided to explore three coordinate systems. However, for the final, I should choose one of them and show one visualization. It can be interactive and shows different data according to the users.
Week 13 — Interaction (11.20.2018)
#01 Tool Exploration
#02 Concept Presentation(5 min) Preparation
- What question(s) are you exploring? Are there any relationships between bike crash and seasons? Are there any relationships between bike crash and specific time? Where does bike crash frequently happen throughout years, seasons, and time?
- What data are you using for your study? and why? Bike crash specific location(municipality code, latitude, longitude), bike crash year(2004~2017), month(1~12), day(Monday~Sunday), specific time(00:00~23:59), bike crash type(), bike crash street type().
- What coordinate system do you propose using? and why? Geographic. Since it is the only coordinate system that can show locations, I chose it. However, I am trying to figure out using other graphs(especially, bar graph) to supplement geographic, since it is better at comparing data.
- What types of information are you using (LATCH)? Location, time, category, and hierarchy.
- What scales + range (+ groups, buckets) are you using? and why? I made my own buckets — grouping 12 months to 4 seasons, grouping 24 hours to 4 time zones — to make it simple and easy to compare.
- What structure are you using? and why? (Linear / Indexical) (Simultaneous viewing / Layer) Layering will be the most important structure of my data visualization. Also, I want to make it interactive, so that audiences can explore the data, which makes them more concentrate on it.
- What visual/aural/temporal/tactile/forms do you propose using? and why? Not yet…
#03 Peer Feedback
I shared my concept with Josh and got a lot of feedback about how to move forward my project.
The most important thing was about my data’s identity. I should have to design my data more understandable even in the first glance. What this data is about? What this data is going to show? What is the most important thing about it? Even though I am going to use a geographic map to show the locations and other information on bike accident, there is no information about ‘bike’. I could use icons or shapes to let audiences know the data is about bike accident when they just see my data visualization.
The possibility to use various shapes: Until now, I just explored circular shape to until now but I might have many other choices, such as square, triangle, round, or diamond. If I use different colors, I can create multiple combinations.
Also, we talked about how can detail interactive features. (Immediate feedback, showing seasons in different ways (snows, leaves), changing map or road colors…)
Week 14 — Concept Presentation (11.27.2018)
#01 Exploration — Color
To visualize seasons and time of a day, I explored several color palettes. It was not easy to choose because the representative colors for each season are different by a person. It was comparatively easy for spring and fall — light green and orange or brown colors. However, for the summer, I could not decide which one is more suitable — red(for hot) or blue(for the beach). According to the color of summer, the winter could be changed. I want to get feedback about this from concept presentation.
Week 14 — Concept Presentation (11.29.2018)
#01 Reframing the questions:
- Where is the most/least frequent bike crash area according to the time of a day(morning, afternoon, evening, dawn) in the recent 5 years?
- Where is the most/least frequent bike crash area according to the bike crash type(Rear-end, Head-on, Angle, Sideswipe same direction, Sideswipe opposite direction, Fixed object) in the recent 5 years?
- Is there any relationship between the time of a day and the bike crash type (+ and year)?
Based on the feedback, the first thing I realized was that the year and season are not attractive parameters. Especially, the relationship between a bike accident and season is difficult to figure out without any dataset about the number of people who ride bikes at that moment. The bike accident number is low in winter because it is cold and there might be not that many people who ride bikes. Instead of dividing the time into seasons, the time of a day could be the main parameter to scope down the time. Also, I decided to add bike crash type as one of the main parameters in my data visualization.
#02 Reframing the structure:
- Layer 01: Let audiences explore how the amount of bike crash changed from month to month according to crash type + visualize animation of each collision type.
- Layer 02: Let audiences explore how the exact locations of the bike crash changed according to month.
- Layer 03: Let audiences explore how the amount of bike crash changed according to the time of a day.
- Layer 04: Let audiences explore deeper into each bike crash → give information about the exact date, the exact time of day, weather, and the type of intersection (road type).
Week 15 — Workday (12.04.2018)
#01 Feedback from Stacie
I could get a lot of feedbacks from Stacie. She generally liked what I had designed and gave me how to improve them. The first thing I should think about was narrative storytelling. It is important because it could attract audiences and give them the main basic information about my data visualization. Also, she recommended me to think about in what context (where, when) do audiences meet this data. To start thinking of this issue, I should design the introduction as well.
- What is the purpose of this project?
At this point, it became necessary for me to think about the basic question of what is the goal of the project. This is the texts from Stacies’ syllabus;
Despite common communication problems that arise in their delivery of information, data visualizations have become a prevalent form of conveying ‘facts’. In this project, we’re going to work to illuminate connections among data in ways that help viewers recognize, engage in, and think critically about their important relationships.
It gave me some questions.
- What facts do I want to tell?
I want to show how the amount of bike crash changed throughout the years. I also want to add some criteria to narrow down the scope in terms of the bike crash type and a time of day.
- How my viewers recognize, engage in, and think critically?
I tried to design multiple layers so that viewers can explore and engage in the visualization. They might try to find the relationship between crash types, time, months, the amount, and locations. They could also see the detail information such as how the crash was severe or how was the weather like at that moment.
- When/Where/How audiences explore this data?
Designing the context of my visualization is also required. First of all, I thought about building an introduction for my visualization. Showing the bike form with the wheel(polar graph) rotating might be one of the options. Or Stacie suggested me to allow audiences to select the bike collision type data sets that they want to explore. But I should think about this deeper, which allows me to design the introduction and the conclusion of my work.
Week 15 — Workday (12.04, 12.06.2018)
Keep improving Prototype
Getting feedback from peers, I kept developing my work. Most of all, I could apply better micro-interactions such as hovering, clicking, or holding for exploration. One of the most important things others pointed out was about the color. Since the colors subtly change from red to yellow, it reminds them to expect the fatality or severity of a bike crash. Continuous color palette seemed not appropriate to visualize the completely different types of crash. Since the collision types and the colors are difficult to match intuitively, I decided to use different distinct colors so that the viewers could easily recognize the categories.
Week 16— Final Presentation(12.13.2018)
Is there any relationship between time, bike collision type, and location?
How is the trend of bike crash look like month by month?
Where is the most/least frequent bike crash area according to the type of crash?
Where is the most/least frequent bike crash area according to the time of day?
LAYER 01: Introduction
The simple animation that is shown to viewers in the first stage gives them information about what this visualization about through the form of a broken bicycle. The introduction could allow them to expect what they are going to explore.
LAYER 02: Collision Types and Monthly Trends
First of all, viewers can look through the overall monthly trend in bike accidents. They can learn in which month bike accidents were the most frequent and which bike collision type took the biggest proportion.
Viewers can explore 5 different bike collision types by click-and-holding each part of the polar graph. The animation of collision is represented with data which helps viewers understand how bike crashes happened.
LAYER 03: Polar to Geographical
Next, viewers can go to the deeper layer of the geographical map by clicking the background.
LAYER 04: Monthly Trends in Location
In this layer, viewers can explore the specific locations of a bike crash. They can compare where was the most frequent bike accident area in Pittsburgh each month, and how these areas changed month by month.
The wheel-shaped interface gives affordance of rotation to viewers so that they can intuitively control the month parameter by hovering and clicking the arrow buttons on either side of it.
LAYER 05: Removable Month Parameter
Viewers can remove the month parameter if they want to focus on other two parameters, collision type and the time of a day, independently. The time of a day parameter divides 24 hours into 4 parts; morning(6am~11am), afternoon(12pm~5pm), evening(6pm~11pm), dawn(12am~5am).
The visual metaphor I designed was disassembling the wheel. When viewers hover on the central point of the wheel, it rotates and gets slightly bigger so that viewers could expect it is clickable.
LAYER 06: The Time of Day Parameters
Viewers can narrow down their scope combining the month and the time zone of a day parameters.
The simple interface located on the left top corner, which represents the horizon and the sun is the controller of the time parameter. When viewers hover it, the selective data is shown with the context of each time zone by changing colors and opacities of the background.
LAYER 07: Collision Type Parameter
Viewers can explore the relationship between collision type and location without other parameters. The patterns of each collision type are shown on the geographical when they hover.
LAYER 08: Overlapping Multiple Parameters
Viewers can explore more specific trends by overlapping parameters. For example, they can choose one of the collision types and see how the number and location of that type had been changed throughout the time of day.
LAYER 09: More Details
The specific information about each bike crash can be given to viewers when they click and hold on each data. The date of the collision, specific time, weather, and even intersection type of roads are shown.
03. WHY DECISIONS
Why did I choose the time of a day and month parameters (instead of year and season that I explored before)?
What I wanted to explore through this project was a bike crash trend on time and location. At the first time, I tried to find relationships between bike collisions and years or locations. However, I could not find a big difference between years. Since the number of bike collision happened each year was not that big enough to show the trend on the map.
Also, in the beginning, I spent some time exploring seasonal trends. But I thought that dividing a year into 4 buckets was not appropriate to show detail changes. This is why I decided to combine all data from 2004 to 2017, and categorized them into months instead of years.
The time of a day parameter was the attractive bucket that I designed. I thought that people have different bike behavior patterns in general according to time. For example, many people would ride a bike to their school or work especially in the spring and fall. Which means, there might be more possibilities for bike crash in the morning and evening of that seasons.
Furthermore, in June, July, and August, when the day is longer than the night, the bike riders appear later at night than other months. Since usual daily patterns could be different from month to month, I thought that the time of a day could be an important parameter to be added on.
Why did I used polar and geographical coordinate systems together? And why didn’t I used cartesian graphs?
At the first time, I felt more attractive to cartesian graphs since it is a great tool for comparison. However, I decided to use the polar graph because of its shape which could represent the wheel of a bicycle. By using similarities in shape, I could strengthen the relationships between the visualization and the data itself. Also, I believe that this visual metaphor could allow viewers to grasp the general nuance of what is this visualization about at the first sight.
Abandoning geographical coordinate system would be not appropriate for me since it is the only system that can represent the location data.
Why did I used those five colors to differentiate the types of collision?
It was difficult to choose the colors for collision types because there are no intuitive relationships between collision types and colors. Thus, it was purely up to my choice. Since the data is related to crashes and accidents, which usually reminds us of a range of red colors, I picked 5 colors from red and yellow.
But I realized that it is inappropriate in terms of reminding people that the data is a range of damage or intensity of the crash rather than 5 different types of collisions. This was why I decided to find the 5 distinct colors which can be visible well on white, gray, and black backgrounds. I tried to assign the orange color to the most severe type of collision — hitting a pedestrian.
04. INTERESTING DISCOVERIES
The most/least frequent bike accident areas in all time
While exploring the bike crash data and visualizing it, I found that there are certain areas where bike crash happened frequently. I tried to look it deeply and find reasons. Usually, bike collision frequently happened in the main or big streets which connects neighborhoods or cities. As we could see on the map, bicycle crash locations were placed along the main rivers, Allegheny River and Monongahela River, where main roads are located. Also, it was interesting to observe that there were frequent bike accidents near the University of Pittsburgh and Carnegie Mellon University, the inner side of between two rivers. It shows that many people commute to work or school by bike. In contrast, bike accidents rarely happened in the are with a green circle. As long as I analyzed, it is because of the geographic feature; there are hills or low mountains so that people could not ride a bike.
The degree of dispersion of bike crash throughout time
In general, I could find that bike accident happened in more condensed areas in the morning rather than evening. I thought this is because the bike area in the morning is usually limited to near the school or where people work. However, in the evening, people would ride a bike in more various areas so that bike accident happened more dispersed throughout the city. It becomes more clear when we see the data of the afternoon. Of course, the number of bike accidents also increases morning → evening → afternoon.
Sideswipe bike collision trends
The proportion of the sideswipe collisions were the highest in the area between the two divided rivers. It might be because of the narrow streets. Needs further investigation.
Hit a pedestrian collision trends
The data shows that the proportion of bike accidents with a pedestrian were the highest in the afternoon and evening, compared to the other morning and dawn. Also, I could find the frequent hitting a pedestrian area in downtown.
Monthly Trends of Bike Collision Density & Area
Exploring how the density and location of bike accident had been changed throughout 12 months was interesting. The most condensed area of each month had a pattern. Usually, in winter, the most bike crash happened near the residential area. However, in the summer, the area became bigger and began to reach the downtown. Finally, from June to August, when the bike accident was the most popular, the collision spread out throughout the whole city. Then it became smaller after September.
05. Feedback & Further Development / Self Reflection
I enjoyed this project visualizing the data and making interactive prototypes. Especially, I thought that using the metaphor of the shape of the wheel made strength connection with the information in data. But I felt that it might have been better if I tried to find more unique connections between the data sets. Also, making stories throughout the project could make it better. If I have a time, I would love to explore further into the relationships between the collision type and the intersection type. In addition, I could try comparing neighborhoods in Pittsburgh using this information.