Data Visualization: Things Lost (But not Found Yet) on NYC Subway
Choosing a Dataset
To begin with the assignment, we need to choose a dataset first. We want the dataset to be easier to work with (in other words, “clearer dataset”) as well as being unique and special. Eventually, we found the Things lost (and not yet found) on the New York subway in Data Is Plural. We found it interesting because we all had experience losing things, especially in public transportation; and we were all fascinated about a large number of things and diverse categories of the items lost in the NYC subway and not yet found.
The dataset has various categories of things lost, such as clothing, electronics, and medical equipment, and every row represents a date. The column names are in a hierarchical structure, such as “Toys.Board.Gam” and “Identification.Benefit.Card”. In total, there are 222 columns and 252 rows. It keeps track of how many total items are lost based on their category and what the daily change is for over a year in 2014/2015, specifically from 8/17/2014 to 12/4/2015. There are some gaps (missing dates) in the data we found.
Two Portraits & Brainstorming
In this design sprint, we were asked to have two different approaches to the dataset, respectively are analytical and persuasive. I only did the persuasive visualization, but as a group, I would also briefly include information about the analytical visualizations.
For the analytical part, our goal is to achieve clear and effective communication with three diagrams. After the first initial exploration of the dataset, we discussed the possibilities of representing the changes in items with a line graph because we have dates in the dataset. Additionally, since the full name of each category is quite long, we also discussed the possibility of creating a general category (for example, merging the columns “Book.Paperback” and “Book.Diary” into “Book” only) and then display the specific category in each general category.
For the persuasive part, we want to develop a sophisticated persuasion for the dataset. The topic of our dataset, things lost but not found yet in the NYC subway, is very close to people’s daily lives but the findings are not “significant” nor “shocking” enough to attract the audiences. Additionally, since our dataset had relatively simple columns, we believed that using a sophisticated approach would distinguish the persuasive from the analytical visualizations more. Therefore, we decided to come up with sophisticated ways to implement the visualizations. Our goal is to emphasize and communicate three ideas: 1. keep track of your belongings 2. what are the things that are easy to lose and 3. notify the audiences that things are getting lost every day.
Therefore, we first had some ideas in incorporating images of the subway and the background music of the subway to emphasize our theme. We also want to engage the audiences by providing them a POV that they are riding on the subway through the graphs and animations. After seeing so many columns, we have decided to merge the data with the same category into a general category so it would be easier to visualize. We also thought about comparing the absolute/relative (absolute: total number of items lost; relative: increase of the item lost) value of things lost each day. The advantage of using an absolute number is to give the audiences an overview of the accumulated things lost; however, relative value is better for understanding the relative growth of each category. For the data visualization, we have ideas of line graphs, plots, bars, using icons and texts, and other approaches listed in the graph. We can use growing lines/bar, as well as texts (with relative font size) to represent the increase in item lost. A more comprehensive brainstorm stage is shown below for the persuasive part.
After brainstorming, we moved to our initial design stage. As before, I did not involve in the analytical visualizations. Thus, I will just briefly introduce and include some information for the analytical visualizations and focus more on my part on the persuasive visualizations.
For the analytical, we decided to create 1. a line graph where the x-axis is the date and the y-axis is the number of things lost (for a general overview of the data); the different lines are each category 2. a word cloud (since there are many categories) 3. a bubble chart that allows users to zoom into each category. The first line graph is to give an overview of the entire dataset and provide insights of the increase in items lost in each category with respect to dates; the second word cloud is used to visualize sub-categories with different font sizes representing the number of items lost in total; and the final bubble chart where the user can select the general category they want to view allowing the users to have a greater understanding of item lost in each category. Therefore, each graph represents a different aspect of the data. The design sheet is below for the analytic part:
In the persuasive part, we first came up with two designs from the brainstorm stage: 1. an increasing bar chart with a train passing through (for abbreviation, BAR design) 2. a gif where a subway circles around each time and stop to display some text data (for abbreviation, GIF design). As we accepted and discussed more in detail for the first design, I will introduce the GIF design first. Below is the design sheet for the GIF design.
The GIF that we found was:
When the train finishes a circle, we would play the accordingly background music indicating that the subway is reaching a stop; at the same time, there would be several text messages appear around the train indicating the number of items lost, and the font size of the texts represented the relative number of items lost. We would also have a date accompanying the texts. The GIF provided an immersive environment for the audience that would make them feel like they were actually taking the subway and “losing items” themselves. However, the text was not very informative. It only informed how many items were lost each time. Additionally, the waiting time for each circle is quite long, and we were not sure what kind of information could be displayed. Therefore, we abandoned this idea and choose the alternative BAR design.
Here is the initial design sheet for the BAR design:
For the BAR design, we would have each general category as a bar on the upper part of the page, and we would also have a date flashing on the top. The bottom would be a train moving with things dropping, representing a sense of losing items from the subway. My initial thought on the bars was to include the absolute (total number) of items lost (the dataset does not start with 0s for each category, there were things lost before 2014). However, after discussion, we realized that it might be more informative to include the relative number so that we could see the changes more clearly as there were large discrepancies in numbers (for example, 10,000 and 300) across the categories. Therefore, after I asked for feedback from others outside the group, they suggested I include both the original (starting number) and the growth, using different colors. Also, as it was difficult to tell the differences between the large and small numbers, we decided to add numbers after each bar so that it would be easier for the audiences to comprehend. As for the previous design, we also decided to add background music to it.
Compared to the previous design, this design was less immersive because it did not provide a sense of “riding on a subway”. Thus, instead, we decided to make the data flashing in a more obvious manner so that it would provide the audiences a sense of feeling that “days are passing, and items are losing every single day”. Additionally, at this stage, we suspected that the animation may cause some distraction, which required me to take a closer look when implementing it.
The feedback we received after showing the initial design was to separate the base number (things lost before the first day) and the accumulated number using different colors to represent it.
After deciding what to do in each of our visualizations, discussing them, and receiving feedback, we moved to our final design stage. As our final design should be presented in a form of a website, we planned to include the web design in our final design stage. The final design also included the techs and more details for our design. Again, I did not do the analytical visualization and only included general information about it as a group member. The final design diagram is shown below:
For the analytical designs, we decided to utilize Tableau to realize the three diagrams. Tableau also has the amazing function of incorporating itself into a website.
As I was the only one who had experience with Web Development, I developed the website. I used my personal website as a template, where there was a navigation bar at the top, and when a user is on a page, the page name would be highlighted via text-shadow in the website. There are three pages in total, respectively are Home, Analytical, and Persuasive. By including a homepage, we wish to include some basic information about our project so that people can understand our project more. Since the analytical and persuasive visualizations are very different, we decided to build two separate tabs for that. For the persuasive, we wanted to be as immersive as possible. Thus, we decided to have it occupy the whole screen and thus needed a start button on the persuasive page.
Implementation & Final Design
The final design of our visualizations and website are:
where the first page is the homepage, the second page shows the analytical visualization, the third page shows the start page for the persuasive visualization, and the fourth page is the persuasive visualization.
I was primarily in charge of coding the website and the persuasive visualization. Through the process of implementing it, I changed the design several times based on the feedback I received to reach this final stage.
For the analytical visualization, we received feedback that:
Instead of applying the same filter on both the word cloud and bubble chart visualization, manipulating the interaction operation to make them show different perspectives for the dataset is better
Thus, our group mate has changed the design to have the filter applied on the bubble chart only.
For the persuasive visualization, I received feedback from several stages:
Stage 1: The very initial implementation
In this implementation, I received feedback from classmates that, although the color theme is consistent with the web design, it is not consistent with the train. Additionally, the color does not represent a sense of “losing things”. Moreover, the animation started right after the user clicked the “persuasive” button, which might cause some confusion. They suggested I add a start page. Therefore, I changed my design into the following:
Stage 2: Red implementation
I believed that red might represent a sense of losing things as red and black created a great contrast. However, they still said that the color is “too dark” and “too depressed”. Additionally, they pointed out that the bar charts were not starting at the same point due to the different lengths of the category names. Therefore, I changed my design to the next, and final stage:
Stage 3: the final implementation
At the final implementation:
- Color is consistent throughout the page
- The white texts are now surrounded by orange text-shadow, which makes them easy to read
- The contrast for the starting part and the increasing part on the bar chart is obvious
- The categories are arranged in order, that the category with the largest number of items arranged to the top
- The start page has a train passing by, providing a clue for the audiences what to expect; the start button also had animation for its border to engage with the users more
Additionally, I implemented a method that, when the order of the categories changed, the corresponding categories would move, and there would be arrows indicating it.
Comparison Between Analytical and Persuasive Visualizations
There are both advantages and disadvantages to communicating in analytical/persuasive ways. To start with, I will briefly introduce the two visualizations and then compare their relative pros/cons.
In total, there are three graphs in the analytical visualization. The first graph is a line graph, where the x-axis is the date and the y-axis is the value (number of items lost), and each line represents a general category. The graph is interactive, which means that users are able to see the value of each point in the line graphs. This graph communicates to the users about the changes in the items lost during the time period and provides an overall insight into the dataset. The second graph is a word cloud, where the categories are very specific. Audiences are able to see the categories that appear more frequently as they are in a larger font, and the word cloud itself is very intuitive for the audiences to understand and comprehend. Compared to the previous line graph, the word clout dives deeper into the specific categories. The third diagram is a bubble chart, where users can utilize the filter on the right-hand side to choose the general categories that they want the bubble chart to show their specific categories. This way, users can have a better understanding of the relative portion of each specific category in and across general categories, which none of the graphs before has achieved. As always, all the graphs are interactive and are able to show the values and names when hovering on them. The second and third graph utilizes the values from the last day of the dataset.
The persuasive visualization is a bit different. When the user clicks the start and redirects to this page, the animation starts immediately with the background music of the subway running. For the animation, there are three major parts: 1. the date will be flashing 2. the bars for each general category will be increasing, along with the number next to the end of each bar 3. the subway will be running from left to right and dropping icons of items along the way. The date flashing and the bar increasing are at the same rate, representing the change each day. Each bar has two parts, the part with lighter orange represents the number of items lost before the start of animation; and the part with darker orange is the new items lost after the animation is applied. The figure above shows the final stage of the animation, where the animation stops at the last date in the dataset.
There are key points that I have concluded the differences between the analytical and persuasive visualizations:
- The analytical visualization is more comprehensive. The word cloud and bubble chart utilize specific categories; for the bubble chart, users can even compare the relative differences between each specific category. However, the persuasive animation only uses the general categories.
- The analytical visualization includes specific data at each point. The persuasive animation utilizes the same dataset as the line graph in the analytical visualization. Since the graphs for the analytical visualization are interactive, users can hover at each point on the line to get the specific value for each day. However, for the animation, the number just flashes and it is impossible for the user to see the numbers each day except for the final day when the animation stops.
- The persuasive visualization emphasizes more on “easy to lose things” through the animation and flashing. The animation, with the flashing date, directly shows to the audiences that people are losing things that could not be found every day. Additionally, the subway picture and the background music engage the audiences into the data, making them feel that they are the ones riding on the subway. However, on the analytical graphs, it is hard to communicate and stress the point. The line graph shows the increase, however, people may find it hard to relate themselves to the data; and the word cloud and the bubble chart do not show the changes with date.
- The persuasive visualization shows the relative differences between each general category better. The bar graphs are in descending order and there are names and icons next to each bar. Thus, it is easy for the audiences to understand the relative differences as well as the orders of the category of things that are easier to lose. The names are easier to read and the icons make the names more understandable. The line graph, however, communicates the differences not so well as there are many lines aggregating together, making it difficult to see the differences.
Thus, there are both benefits and drawbacks of using persuasive and analytical visualizations. It is important to understand the audiences to choose the best form of visualization for them. For the general public, where the goals are mainly about to tell them what are the things that are easy to lose and remind them to keep track of their own items, the persuasive animation is suitable because the public does not need to know many details. However, for a group that has an interest in the data and wants to know the data in detail, the analytical graphs are more suitable.
Conclusion & Reflection
To conclude, our group has visualized the data Things lost (and not yet found) on the New York subway in two different ways, clear and persuasive communications, and produced a website and a demo video. There are three graphs for clear communication, each represents a different aspect of the data that helps the audience to understand the data in a holistic way. There is one sophisticated persuasive communication through animated bar charts and some theme-related animations to deliver the message that which items are easier to lose and the need to take care of their own things.
I appreciate this project as an opportunity to touch the persuasive visualization first time. In my previous classes and projects, I only did analytical (clear) visualizations. However, this project allows me to think in a way about how to persuade my audiences.
Thanks to my teammates, Amanda Chu, Siyan Pu, and Kunal Suri, and the classmates provided us feedback.