I’ve been confused and frustrated by Pittsburgh’s road system. It doesn’t seem that explicitly different from many other cities I’ve lived in but seems to elicit an uncomfortable reaction, and there are many structural and geographic weirdnesses that I have noticed that I’ve thought might be the basis for the unusual sensations.
I learned about a technique called the “Pittsburgh Left”, where a left-turning car in an unprotected-left intersection speedily turns before the oncoming traffic responds to their green light. This unsafe practice has somehow become synonymous with the city.
I’ve been frustrated by the preponderance of no-right-on-red intersections, seemingly with no basis for their prohibition.
In wondering if the Pittsburgh Left has caused more accidents, I began to wonder if certain intersection design predisposes cars to certain types of accidents. Ideally a desire to avoid the most fatal or damaging accidents would incentivize traffic and city layout designers/engineers to create safer intersections.
I’ll be comparing and discovering correlations between traffic accident data and the type of intersections that perhaps cause them with greater frequency. I’ll be incorporating intersectional traffic flux, types of intersections, number of accidents, and types of accidents.
Graphed are 65,500 collision locations. Each dot is at 1% opacity. It’s interesting to see that certain intersections are entirely opaque. Those must be very problematic intersections — I’m going to see what sorts of intersections those are.
I plan to display all datapoints of vehicle crashes on a map of Pittsburgh, then turning on opacity to see where the greatest preponderace of them are. Certain intersections are far darker than others, indicating that they are especially problematic and prone to inducing vehicular crashes. The question is then: “what makes these intersections so dangerous?”.
My dataset contains an extensive number of aspects like type of vehicles, road/weather conditions, injury number+severity, driver age, and crucially type of intersection. The data has a specific number corresponding to each intersection type (though I can’t find the legend to tell me the actual correspondence; I’ll have to look on their website rather than the raw .csv it seems. In the worst case I can manually do it by crosschecking the data coordinates with satellite photos).
I’ll show a graph of each type of intersection, showing the total number of crashes in each type. Though I don’t know the outcome, I am presuming the spread is going to be uneven and I’ll be able to focus on aspects of the most problematic intersections.
If I have time, it might be good to look at if there are similar intersection types with fewer crashes that might be easy to adapt the problematic intersections into.
I’ll most certainly be looking locationally. A central visualization technique.
Alphabetical is likely unnecessary.
Time might be interesting, especially seeing if the time of day affects certain accidents or combines with aspects of the intersections.
There are many different categories of intersections, from stop signs to lights, one way, T, unprotected lefts, etc.
I can’t think of many aspects of heirarchy that I could focus on. I’ll keep that aspect in mind as I work with the data.
In speaking with a few of my classmates, I’ve excised a view of my map. Initially thinking I would display the raw locations of crashes before diving in and lowering the points’ opacities (already shown above at two zoom levels), I came to the realization that the bulk of my interests in intersections (and the relevant aspects) are not geographical, and thus using a map would likely put more focus onto the relative positions of the intersections and not the aspects that are shared between them.
Further, the number of data points I was dealing with in the dataset led many programs to choke on the memory requirement, vastly slowing my rate of visual iteration.
Before I came to the realization about the map view, I produced a few more polished visualizations of the data:
And a zoomed in version only showing the densest (and thus most crash-prone) intersections:
Including attempts at attaching data to intersection points:
And even intersection type icons:
My dataset included information about the intersection type, from four-way stops to roundabouts to highway on ramps, etc, and also what sort of control indicator is present (stop sign, traffic signal, yield, railroad crossing, etc). If I was to surface data on each type of intersection, it would be helpful to have a visual indicator, allowing for more immediate recognition of the intersection.
I darkened the central intersection zone and lightened the non-intersection roads to distinguish between the area of focus and the ancillary elements.
Most of my data were magnitudes: number of crashes, injuries, fatalities, etc. I sorted my spreadsheet by type of intersection and summed each magnitude type, arriving at my base data.
Early on in my attempts at graphing these values, I moved away from simple bars denoting the magnitudes, and they didn’t strike me as quantized enough. A bar could easily be a percentage, whereas some more chunked representation better communicates how these crashes are each individual moments, each injury and death an individual pain and loss. Further, when using bars to represent magnitudes, the vast differnece in magnitude between upwards of 80,000 and the low end of 226 meant that bar charts would be incredibly stretched:
Instead I experimented with a dot-based approach. What’s helpful about this is that by widening the base I can not only shorten my super-large magnitudes a bunch but also use the width to bucket my data into a more decimal visualizaition.
And played with color to differentiate a normal accident from one causing a fatality, making ratios visible:
And attempting a percentage view of the same:
This dotted visualization became horizontal in the interface I showed for all-class crit: [My basis for a sort button here is a holdover from the order that the dataset listed out the intersection types, leading the descending order of magnitudes to not correspond with the order provided in the data set. Rather than providing a sort button, I should have from the beginning sorted the intersections automatically.]
After the classroom crit where I showed my first attempt at a clickable interface, I received good feedback about what context I should include, and some comments on what I was surfacing on the hover state, and where I should label.
The limitations of the dot view are that after a while, it starts looking like a single larger magnitude, with the now-horizontal axis communicating very little about the magnitude other than length. I attempted another approach, involving bucketing the data into bars and squares corresponding to lineating and gridding out the dots.
What’s helpful about this visualization is that the squares and lines are much more easily counted automatically, breaking up the monotony, and mapping quite directly to the decimal places in the number values themselves.
Since I was separating the dots, I knew that if I kept each block and line the same color as the original dot, I would be in a sense filling in the area in between the dots with what looks like more dots, thereby giving the viewer a warped sense of the actual magnitude encoded into the elements. I opted for an approach where I imagined smoothing out the limited amount of “pigment” across larger and larger areas, allowing the squares and bars to correspond more directly to the magnitudes actually embedded inside them. Plus it allows the differing visual encodings to be more stark next to each other.
And it works cleanly with comparing multiple categories of data:
Since I was working with different categories of data (crashes, injuries, and fatalities), I knew it would be helpful to communicate them in different colors. Red as fatalities makes sense, and since an injury is arguably a step on the way to a fatality, I chose an orangish yellow for injuries, being “on the way” to red. I was careful to choose a yellow that had enough contrast against a white background while being visually distinct from the red.
I then ran into more difficult matters when dealing with overlapping the data. Since injury-causing crashes are a subset of crashes overall, should I superimpose the yellow onto the gray? Should I replace those elements entirely? Should I inset them? I ran into some weird corner cases with my initial implementation:
For example, the first row. 600 fatalities, 49,129 injuries, 82,814 crashes total. If I have a 6 fatality dots on the first column, what should I do with the rest of the column? Perhaps I fill them in with the next set of data, in this case injuries. But now that I’ve used up my first column, I can’t go into the squares since now they’d be offset by one column and not align with the other rows. So I fill in the next nine columns with yellow bars, and being at ten I can now go to using the squares until I reach the point where I have to return to bars to communicate the 9,000 decimal place of injuries. And after the 100s place for injuries I can return to square for the rest of the crash data.
For the second row, with 94 fatalities, 19,730 injuries, and 28,157 crashes total, it became even weirder. Using up the first column filling in the unused dots from the fatality with injury dots required me to fill the next nine columns with injury bars. And since there are less than 20,000 injuries total, I then have to go into bars anyway after the first 10,000 to fill in the 9,000 place for injuries. And then I have to go into bars for the crashes section, so I’ve been left with a whole series of bars, putting me basically back to where I was with the purely dot grid. I had to find a better way of overlapping the data. My second chunk of data on row two is a second attempt, where I go right into squares at the first opportunity. This isn’t terribly unclear, but it’s more difficult to read since now the places are swapped around a bit.
I took this to the extreme in this experiment
wherein I stack all the squares together, then the bars, then the dots. This is fairly clean, visually, but separating the categories so isn’t very visually nice and doen’t help as much as I thought.
Thus I was led to my first attempts at overlapping data. The biggest difficulty here is making sure that the different darknesses of elements are clear enough when superimposing them.
As Andrew Twigg pointed out in the final presentation /crit, the differing values for dots/bars/squares could have been tuned better. The apparent lack of tuning in isolation was in part a response to tuning them for better visibility against each other. I don’t know what the best solution is here. I’ll likely approach him so we can go through it together and come to a better solution, since I’m sure there is one.
This was my final build of the main dataset per intersection.
I built each combination/permutation out (crashes, injuries, fatalities, crashes+injuries, crashes+fatalities, injuries+fatalities, and all three together) and exported each as a layer to be superimposed in my Kite document.
My legend also took some time developing, as I tried to find the best and simplest arrangement. Initially trying to communicate equality of 10 dots into 1 bar and 10 bars into 1 square, I tweaked it until I had the fewest elements and greatest parsimony.
I liked the idea of the data toggles being almost a sentence expansion, thus “Pittsburgh Vehicular Crashes” or “Pittsburgh Vehicular Fatalities” etc., which led me to their stacked layout. I ensured the color mapping held.
Upon hover the values of each intersection/row are displayed next to each aspect toggle.
The legend-labeled box has elements that expand up into the fully-labeled legend.
Clicking on the percentage view button navigates to the percentage view:
Here I move away from the quantized elements, instead using bars to communicate how this view depicts each intersection uniformly and on the same terms. Sorting sorts.
If I were to take this interface further, I would label the intersections, and label the points on the bars with the actual percentage data as, since they are all out of 100%, are more easily compared than essentially random numbers in the high five-figures as in the magnitudes view.
I would further experiment with a vertical layout for my graphs, which might allow me to better use the left quarter of the screen, instead putting navigation and display options along a top or bottom bar.