Designing a Data Visualization for Local Seasonal Produce Harvests
Creating a tool to reveal what fruits and vegetables are locally plentiful throughout the year.
At Type/Code we primarily create digital products and experiences for our clients, so when we started Data Supply Co. as a side project, we wanted an outlet to explore topics of interest through data, and find delightful intersections between visual form, quantitative information, and utility — and have an excuse to create physical prints of our design work. The initial Hops Chart was designed to help inform beer characteristics and brewing recipes, and the Whiskey Chart helped explain the ingredients, distilling process, and classifications of whiskey.
At our studio, folks are passionate about food. From participating in our local CSA programs, to back yard gardening, to discovering new recipes or restaurants, we all have a deep connection with fresh produce. Exploring the seasonality of local produce availability seemed like an intriguing evolution of our data visualization series. So we created the Harvest Chart print, mapping the weekly harvest percentages of crops across the Northeast US. What started as a print design project accidentally morphed into a digital project, as we needed to develop custom software to explore different permutations and visualization approaches for all of the data we gathered.
We are now excited to be launching the Digital Harvest Chart as a web application, so others can explore the data, beyond what would be possible in printed form.
The design objective for the project was to answer a simple question:
What produce is growing near me right now?
Some quick Googling will return simple Gantt-style charts of what produce is “in season” for a given month in a given state. But even when published by state-sponsored agricultural organizations, these charts were typically limited to a binary status of a crop being available or not. We were hoping to more accurately explore the true ebb and flow of each crop’s week-over-week yield within any given region, rather than being bound by agriculturally-arbitrary state lines, and better reflect the organic nature of the topic being visualized.
In thinking about my own use cases, the existing charts were not effectively communicating the ‘ramp up,’ ‘peak,’ and ‘wind-down’ lifecycle for various crop harvest seasons, or allowing one to aggregate data across multiple states (my local farmers’ markets are typically sourcing from New York, New Jersey, and Connecticut, for example).
Like all data visualization projects, we needed to start with a data set. The USDA has an amazing resource, the National Agricultural Statics Service, which maintains an easily searchable database of crop data. While the total annual yield of a given crop in a given state has been readily available for a while now, over the past several years, progressively more US states have begun to report weekly crop harvest progress as a percentage of total seasonal yield. So if New York State harvested 1,376,800,000 lbs of apples in 2018 (that is the actual number), we now know what percentage of that total was harvested each week of the year.
For the initial prototype, I (selfishly) started with New York, and collected all of the weekly harvest progress percentage data NASS had available. That only included 12 crops, but it was a good starting point. For an initial test, I translated the data from the cumulative weekly percentage of total yield into the week-over-week change to reflect the actual percentage of the total that was harvest each week.
Visualization design explorations
Since we’re trying to visualize what percentage of each crop’s total harvest was available each week of the year, we started with exploring a handful of standard charting approaches with Google Sheets:
Some quick tests revealed both challenges and opportunities. With several crops having overlapping harvest timeframes, a line chart is a jumbled unreadable mess. A stacked bar chart was a bit more readable. On any given week, one could see what crops are plentiful. For a few crops, the general growth and decline of the harvest season could be read, but each crop’s seasonal trend becomes harder to read as we move up from the x-axis.
The stacked area chart was getting much closer to being readable and useful. We could start to see the trend of each crop, and the aggregate seasonal trends across all crops. But much like the stacked bar chart, there was a “distortion” challenge as we moved further up from the x-axis. The crops on the top of the stack need to bend more significantly around all of the other crops below, in this case making both the beginning week and the“peak” weeks harder to identify.
This appeared to actually be an appropriate opportunity to use an oh-so-cool-but-rarely-the-right-choice Streamgraph visualization (PDF), created by Lee Byron in 2008. The streamgraph essentially centers a stacked area chart over the x-axis, making each area’s stream more readable independently, at the expense of being able to easily measure the aggregate value of all streams at any given point.
There are a handful of reasons why a Streamgraph makes sense for visualizing weekly produce harvest percentages. We’re looking at multiple concurrent time series, but we mostly want to identify the growth and decline of each individual crop’s harvest season, while still generally understanding the aggregate growth and decline of all crops within a given region, or spot specific weeks with lots of plentiful crop yields.
We don't care about the exact percentage value of a crop’s yield on a given week—if this week was 6.3% of the season’s total yield—rather, we want to know “are there more or fewer apples than the previous week.” We also don't care about comparing crops to each other. Since each crop is showing percentages of different totals, we just want to know when will each be plentiful relative to its own total yield. While a common critique of streamgraphs is that comparative and aggregate accuracy is being sacrificed for aesthetic fluidity (maintaining a through-line for each time series), the organic nature of this subject matter made sense for that compromise.
Unlike our previous data visualization projects, we couldn’t jump directly from sketches and spreadsheets into Adobe Illustrator — we needed to plot a streamgraph out computationally to accurately and efficiently finesse the visualization logic. Representative colors were assigned to each crop type, and then the data was plugged into a quick D3 Streamgraph prototype, and we could start to see the seasonal ebb and flow of each crop.
The initial experiment was immediately informative and compelling. We could see early summer cherries, a midsummer peak of corn, peaches, pears, and beans, and then potatoes, grapes, and apples peaking in September and October as we moved into fall.
Unfortunately, the USDA only has weekly harvest progress data available for about 30 states, and only about 25 crop types across those states, with several states reporting weekly progress on only a couple of crops (I’m looking at you, Idaho, I know you produce more than just onions and potatoes). So we augmented the available USDA data with lower-fidelity binary duration-of-availability data (from the NRDC and various state agricultural organizations), which we then mapped the higher fidelity week-over-week distribution onto. So for example, if we had the true weekly harvest percentages for apples in New York, and just the start and ending weeks for apple season in Virginia, our hypothesis was that week-over-week distributions would be relatively similar, even if the season was earlier, later, shorter, or longer due to its geographic location.
Creating a web application
The next challenge was figuring out how one might actually explore the weekly harvest data of about 110 different common crop types, across all 50 states in the US (275k starting data points), and how we could aggregate data across multiple neighboring states to represent realistic crop availability across larger geographic regions. While the initial static prototype of New York’s data only took a couple of hours to build, we needed a more efficient way to slice and dice all of the available data.
We designed a quick user interface to allow multiple states to be selected concurrently, allow crops to be enabled or disabled as needed, and change the sorting logic. An updated D3 visualization and controls interface were powered by a Django app handling the data logic and a simple content management system so we could easily update our data sets and adjust the colors assigned to each crop type.
With multiple states selected, we averaged each week’s percentage across all of the states, and then normalized the total volume of the crop’s entire season back to 100%. The entire area of each crop’s stream is always consistent (since we’re not comparing crops to each other), so shorter season crops will have thicker peaks than longer season crops. As multiple states’ data is aggregated together, the visualization typically moves towards a standard bell-curve distribution (mirrored over the x-axis), unless the crop has two very distinct seasons (like lettuce).
The sort order of crops in the visualization proved to be another interesting exploration. Early in the design process, it seemed self-evident that each crop’s stream should reflect the crop’s color in real life, for quick visual recognition before needing to read a label. Sorting crops alphabetically looked messy, but is pretty readable due to the contrast of random adjacent crop colors. Sorting by “peak” harvest week is arguably logical—earlier season crops starting at the top left progressing down to later season crops on the bottom right. This method resulted in a strange ‘twisted taffy’ look, but with similar crop seasons in close proximity, though, the “distortion” challenge was exacerbated, and each crop’s individual peaks became harder to visually identify.
Sorting by color algorithmically is oddly challenging, so we ended up giving ourselves the ability to set the color sort order manually within the CMS. While sorting by color increased the challenge of making sure adjacent colors in the pallet could be visually distinguished, it was very visually pleasing… because, rainbows. Sometimes you need to just make some good old human design decisions.
With our web app humming away, we were able to continue gathering data, fine-tune our visualization logic, and continuously check the output against other crop harvest resource to ensure the output made sense relative to the other simple state-by-state harvest calendars we could find.
Creating a print
Since Data Supply Co. has thus far existed as an outlet for meticulously printed data visualizations, that was still a goal for the Harvest Chart. We had just ended up designing and building some custom software this time around as part of the process. Since most of the useful weekly progress USDA data that was available happened to be in northeastern states, we decided to start with a “Northeast” edition of the print. Our app lets us visualize the exact states and crop combinations that we wanted, and then we pulled an SVG of the visualization into Adobe Illustrator to finesse the design. In late 2018, we launched the first Harvest Chart, Northeast print.
Now we’re excited to be launching the Digital Harvest Chart, so anyone can explore all of the produce harvest data that we’ve collected, across all states and crops, and easily discover “what produce is growing near me right now.”
In the near future, we’ll be releasing prints for additional geographic regions in the US. Stay tuned for updates at Data Supply Co.
This project wouldn’t have been possible without support from an amazing team. A huge shoutout to Chris Hakos on development, Pei-Yi Ni on design, Meng Zhang on icon illustration and data research, and Michelle Gauthier on data research.
Data Supply Co. and Harvest Chart were created by Type/Code, a digital product studio in Brooklyn, NY.