Thesis Modules
Published in

Thesis Modules

Case Study 02: Anomaly Detection

United States Fisheries Data Exploration

Researchers at the Brown University Public Policy Institute and the Carnegie Mellon School of design, are interested in understanding the fish production system from harvest to table and home service in the United States. The system is complex, preliminary investigation indicates that understanding the black box of fish processing practices, the middle-men of the system, will reveal fraudulent and unsustainable practices that need to be improved if we are to make our fishing sustainable overall.

I was asked to make a simple interactive application to visualize the top ten wild and farm fished species in the United States. The concept was to create simple bar charts which show the total number of tons caught, and what proportion of that domestic catch was exported in the same year. However, in the process of manifesting this simple visual, I discovered several anomalies which helped clarify the research direction of the team. My assertion is that the necessities of visual form giving helped reveal anomalies and ask questions of this data set.

the four visual form concepts

The concept began with four answers to two questions: Should the tonnages be ordered by total domestic catch or export amount? Should the visualizations be baseline or midline organized? Sketching is an effective way to imagine the potential answers to questions like these. It was clear that for the researchers use case, midline organized export ordered visualizations most clearly told the story that they wanted to understand, and I didn’t have to start coding at all to answer this question. Visual reasoning in this manner is a practical way for a visualization designer to move research questions forward, particularly when they relate to visual form giving.

The work of anomaly detection began from this point forward, starting with the first computer drawings of the real data. I wrote a parsing and sorting algorithm in python and then visualized the ordered data in processing. Below is the first graph that I produced. Two things immediately stood out: first the scale difference between the largest catch and the smallest made it nearly impossible to see both at the same time, second we appear to export more than we catch of several fish species. On the face of it this seems impossible, or at least requires further research and explanation.

white bars are export tonnage, black bars are domestic catch

We export more than we catch per year of some fish species.

In this graphic, the massive white bar represents the amount of salmon that we export every year. However, we only catch a tiny fraction of that amount. This indicates a further problem: the bars are order by tons of domestic catch, of which export should be a fraction. In the case that export tons are greater than domestic catch, how should the bars be ordered? What is the data really saying? Does the United States export 100% of salmon caught every year? Does the United States export a number of tons far greater than it catches, but nonetheless keeps some of the catch for domestic consumption? The necessity of making ordering and rendering choices for visual form giving, revealed the fact that further research is needed to understand exactly what is occurring in our salmon harvesting practices. As a visualization designer, I can contribute to the research team progress by highlighting new questions and directions to pursue, which might have been previously hidden in the data.

I placed the full bar for salmon on the export side, indicating that all caught tons are being exported, and applied a saturation change to indicate that the United States exports more than this, but that amount requires further investigation.

Decrease in opacity communicates anomalous data

In this graphic, the anomaly is clarified in such a way that readability is maintained. However, there is also the possibility that this visual arrangement is misleading. The form indicates that 100% of domestic catch is exported, however this could be false. Perhaps a smaller percentage of domestic catch is exported, but frozen or fraudulent tons of fish are being exported as salmon caught this year. This is a strong indicator that this is a question worth investigating for the researchers that I am working with. This is a phenomenon which occurs more frequently in aquacultured species, indicating a place for researchers to start their investigation of fraudulent production practices.

The second anomaly is the huge difference in scale between the maximum and minimum catch tonnage. This presents a problem for viewers because they cannot access the data visually. One typical solution to a data range problem such as this would be to use a logarithmic scale. However, these scales are less intuitive for public consumption. Since the target audience for this publication includes both researchers and non-expert members of the public, this is not a viable solution from a design standpoint.

I built two interactions into the application which solve for this problem dynamically. First, the user can filter out the two largest catches from each graph, and then they can change the scale factor in order to see the data rendered at an appropriate scale. This allows both an expert and a non-expert user to see the data in exactly the scale required to answer their inquiry.

2008: full data set, zoomed out
2008: truncated data set, zoomed out
2008: truncated data set, zoomed appropriately for wild catch
2008: truncated data set, zoomed appropriately for aquaculture catch

The dynamic and exploratory nature of being able to zoom and truncate at will allows the user to answer their questions visually. Because the zoom interaction requires several steps, the user’s mind retains cognitive connection between each state, which helps to maintain orientation within the data set while examining it at different levels of scale.

In this mission, I had to move fluidly between sketching, data analysis, data visualization, and code. Although none of the tasks were particularly difficult, usually these roles would be filled by multiple people with different expertise. However, my expertise involves a fusion of analytic thought and creative expression. My ability to work across these disparate tasks allows me to question and investigate the data in an unbroken way, potentially leading to novel conclusions and research directions which would have remained unexplored.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adrian Galvin

design • science • visualization • illustration • jiu jitsu