Working with USDA’s coffee database: Part 1

numbers.coffee
2 min readJan 27, 2019

--

The USDA maintains an incredible, free database of global commodity statistics here. It includes data about coffee production, exports, and imports for 83 countries, going all the way back to 1961. It’s been a jumping-off point for much of the analytical work I’ve done in the coffee sector over the past decade.

I recently wrote some code to automate data extraction from the USDA’s website and put it into a Pandas DataFrame.

The resultingusda DataFrame looks like this:

Screenshot of our DataFrame as shown in a Juypter Notebook

Now we can write some simple modules to query what we need from the dataset. In the first example, we want to get all attributes for a single country.

We query ‘Guatemala’ and get back the following table, which shows all the data on Guatemala from 1961 up until the present:

All of the USDA’s data on Guatemala

The module above would also allow us to specify a specific attribute we are interested in (instead of all of them). If we were to call country_query('Guatemala', attribute='Arabica Production') then we would get a table like this:

Just the data on Arabica Production for Guatemala

Finally, let’s create a similar type of query that extracts a single attribute for a list of countries. In this example, we’ll query ‘Arabica Production’ for three Central American countries.

As expected, we get the following table back:

Arabica Production for three Central American countries

In the next post, we’ll use these queries to create visualizations of the USDA data.

View the complete notebook on github.

--

--