Analyzing U.S. exports with Plotly

Understanding data with visual tools

Valentina Alto
Sep 28, 2019 · 6 min read

In my previous article, I’ve been providing an introduction to some useful graphical tools available in Plotly, an opensource library which can be used both in Python and R.

Here, I’m going to play a bit more with Plotly’s functionalities, using as input some data about USA exports in 2011. So let’s import and explore our data:

import pandas as pd df= pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv') df.head()
Image for post
Image for post

So we have a list of states with the relative amount of exports of different raw foods (plus the amount of cotton).

The very first thing we might be interested in is inquiring about the amount of total exports (in dollars) for each state. We can do so with a very intuitive visualization tool, which involves a U.S. map and uses colors as an indicator of the amount of exports:

import plotly.graph_objects as gofig = go.Figure(data=go.Choropleth(
locations=df['code'], # Spatial coordinates
z = df['total exports'].astype(float), # Data to be color-coded
locationmode = 'USA-states',
colorscale = 'Blues',
colorbar_title = "Millions USD of exports",
))
fig.update_layout(
title_text = '2011 US Agriculture Exports',
geo_scope='usa',
)
fig.show()
Image for post
Image for post

From the map, we can easily say that California is, by far, the state with the highest amount of exports (indeed, the darker the color, the greater the amount of total exports).

Now let’s analyze some interesting features of those exports.

First, for both fruits and veggies, we have two types: fresh and processed. We might be interested in inquiring about the ratio between those two (that means, which portion of total fruits/veggies exported is processed and which is fresh). We can do so by visualizing a bar chart which, for each state, displays the amount of those features.

So, for veggies, we have the following:

fig = go.Figure()
fig.add_trace(go.Bar(
x=df['state'],
y=df['veggies fresh'],
name='fresh',
marker_color='blue'
))
fig.add_trace(go.Bar(
x=df['state'],
y=df['veggies proc'],
name='processed',
marker_color='green'
))
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()
Image for post
Image for post

And we can do the same for fruits:

fig = go.Figure()
fig.add_trace(go.Bar(
x=df['state'],
y=df['fruits fresh'],
name='fresh',
marker_color='blue'
))
fig.add_trace(go.Bar(
x=df['state'],
y=df['fruits proc'],
name='processed',
marker_color='green'
))
# Here we modify the tickangle of the xaxis, resulting in rotated labels.
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()
Image for post
Image for post

We can derive two important information from the former graphs:

  • Not only California is the state with the highest total exports, but also it is that with highest fruits and veggies exports

Furthermore, processed and fresh fruits/veggies are positively correlated. More precisely, they exhibit a Pearson correlation coefficient equal to 1:

import seaborn as snsdf_corr = df[['veggies fresh', 'veggies proc']
sns.heatmap(df_corr.corr(), annot= True)
df_corr = df[['fruits fresh', 'fruits proc']]
sns.heatmap(df_corr.corr(), annot= True)
Image for post
Image for post

We can also visualize those correlations together with the amount of total exports of that state. Let’s do it for fruits (the same reasoning holds for veggies):

import plotly.express as px
fig = px.scatter(df, x="fruits fresh", y="fruits proc",
size="total exports", color="state",
hover_name="state", log_x=True, size_max=60)
fig.show()
Image for post
Image for post

Here we can visualize even more clearly the gap between California and other countries both in terms of fruits (you can see it in the distance between the labeled bubble and the other ones) and in terms of total exports (you can see it from the size of the bubble).

Now let’s run a similar analysis for the three items beef, pork and poultry, since I want to inquire whether they are, in some ways, correlated. It is a legit question, since one might intuitively think that, as they are different types of meat, they should be positively correlated. So let’s see whether this intuition is true:

df_corr = df[['beef','pork','poultry']] 
sns.heatmap(df_corr.corr(), annot= True)
Image for post
Image for post

Differently from processed vs fresh fruits/veggies, here there is no relevant correlation between the three types of meats, which might be counterintuitive. Let’s visualize it in a better way, using the same bubble visualization for meat:

fig = px.scatter_3d(df, x='beef', y='pork', z='poultry', color='state', size = 'total exports') fig.show()
Image for post
Image for post

As you can see, there is no clear pattern of data, nothing showing that higher exports of one type of meat lead to higher exports of the others.

Nice, now let’s extend our area of interest to all the items exported. We are interested in inquiring about the composition of the export portfolio of the 5 states which the highest total exports.

So once picked our states of interest:

df.nlargest(5, ['total exports'])
Image for post
Image for post

We can build, for each country, a pie chart. Namely, for California we will have something like that:

labels = ['beef','pork','poultry','dairy', 'total fruits', 'total veggies', 'corn', 'wheat', 'cotton']
values = df[['beef','pork','poultry','dairy', 'total fruits', 'total veggies', 'corn', 'wheat', 'cotton']].loc[df['state']=='California'].values[0]
fig = go.Figure(data=[go.Pie(labels=labels, values=values, hole=.3)])
fig.show()
Image for post
Image for post

And we can do the same for the remaining four countries:

Image for post
Image for post

It emerges that, for all the 5 countries, the main item exported is fruit (both processed and fresh). Particularly, Illinois exhibit a portion of fruit exported of almost 3/4.

The very last thing I want to inquire about, is cotton exports, as it being the only item not edible to be exported. Let’s use again a geographical representation:

fig = go.Figure(data=go.Choropleth(
locations=df['code'],
z = df['cotton'].astype(float), # Data to be color-coded
locationmode = 'USA-states',
colorscale = 'Blues',
colorbar_title = "Millions USD of exports",
))
fig.update_layout(
title_text = '2011 US Cotton Exports',
geo_scope='usa', # limite map scope to USA
)
fig.show()
Image for post
Image for post

This result is very interesting. First, we see that California is no longer the leading state to export: now the winner is Texas, with more than 2000 millions of USD of cotton exports. Furthermore, it is evident that cotton exports are not even contemplated into northern states, where the amount is equal to 0.

Using graphical tools to size interesting features of your data is a very useful ‘pre-step’ of your data analysis. Indeed, before deciding which features are worth your investigation, you might gather relevant stuff from just having a first glimpse (of course, following a theory you want to prove) of your data.

Originally published at http://datasciencechalktalk.com on September 28, 2019.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Valentina Alto

Written by

Cloud Specialist at @Microsoft | MSc in Data Science | Machine Learning, Statistics and Running enthusiast

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Valentina Alto

Written by

Cloud Specialist at @Microsoft | MSc in Data Science | Machine Learning, Statistics and Running enthusiast

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store