Creating a sliding animated bar plot with Python and R

Arda Korkmaz
4 min readJul 22, 2019

--

Let me be more specific: Gathering and preparing the dataset in Python and creating an animated bar plot in R.

Final work of animated bar plot

If you are a Python lover and unwillingly have to admit like me that R has some cool libraries such as gganimate and would like to benefit from them, this post is just for you.

Let us create a sliding animated bar plot (the one as above) by using one of the most popular datasets in Kaggle. And here it is:

I’ll start with good old pandas’ read_csv function in order to read gathered data as a csv file and create a data frame. And first 5 rows of the data frame are like this:

# Read the Spotify dataset
df = pd.read_csv('spotify_dataset.csv')
df.head()

There are a couple of things for assessing the data before I prepare the final data frame for animation in R.

Region values were given in acronyms so I create a dictionary and replace with the true region names. The type of the Date column was an object so I change to datetime for ease of use and creating extra columns such as year, month, day and day_of_week. And lastly, I create a new column called Title combining track name and artist for the Y-axis of the bar plot.

# Create a dictionary mapping for the real region values (https://www.spotify.com/us/select-your-country/)
region_dic = {'ar':'Argentina', 'at':'Austria', 'au':'Australia', 'be':'Belgium', 'bo':'Bolivia', 'br':'Brazil', 'ca':'Canada', 'ch':'Switzerland', 'cl':'Chile', 'co':'Columbia', 'cr':'CostaRica', 'cz':'CzechRepublic', 'de':'Germany', 'dk':'Denmark', 'do':'DominicanRepublic', 'ec':'Ecuador', 'ee':'Estonia', 'es':'Spain', 'fi':'Finland', 'fr':'France', 'gb':'UnitedKingdom', 'global':'Global', 'gr':'Greece', 'gt':'Guatemala', 'hk':'HongKong', 'hn':'Honduras', 'hu':'Hungary', 'id':'Indonesia', 'ie':'Ireland', 'is':'Iceland', 'it':'Italy', 'jp':'Japan', 'lt':'Lithuania', 'lu':'Luxemborg', 'lv':'Latvia', 'mx':'Mexico', 'my':'Malaysia', 'nl':'Netherlands', 'no':'Norway', 'nz':'NewZealand', 'pa':'Panama', 'pe':'Peru', 'ph':'Philippines', 'pl':'Poland', 'pt':'Portugal', 'py':'Paraguay', 'se':'Sweden', 'sg':'Singapore', 'sk':'Slovakia', 'sv':'ElSalvador', 'tr':'Turkey', 'tw':'Taiwan', 'us':'USA', 'uy':'Uruguay'}
# Replace with the true Region names
df = df.replace({"Region":region_dic})
# Replace the Date type for ease of use and creating extra columns
df.Date = pd.to_datetime(df["Date"])
# Create year, month, day and day of the week columns
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
df['Day_of_week'] = df['Date'].dt.dayofweek
# Combine Track name and Artist for ease of use
df['Title'] = df['Artist'] +' - '+ df['Track Name']
df.head()

Finally, I create separate data frames for each month having one region (in this case the USA) and then concatenate all into one in the following structure. Index (1 to 10 for each month), Title (for the Y-axis), Streams (for the X-axis) and yearmonth for grouping. Realize that each month’s sum of streams represents year-to-date numbers so that we achieve growing bars during the timeline. Then I write the final frame to a csv file in order to work on animating in RStudio.

df_usa_1701 = df[(df.Region == "USA") & (df.Year == 2017) & (df.Month.isin([1]))].groupby(["Title"],as_index=False).agg({"Streams": "sum"}).sort_values(["Streams"], ascending=[False]).head(10).reset_index(drop=True)
df_usa_1701.index = df_usa_1701.index + 1
df_usa_1701["YearMonth"] = 201701
df_usa_1701 = df_usa_1701.reset_index()
df_usa_1702 = df[(df.Region == "USA") & (df.Year == 2017) & (df.Month.isin([1,2]))].groupby(["Title"],as_index=False).agg({"Streams": "sum"}).sort_values(["Streams"], ascending=[False]).head(10).reset_index(drop=True)
df_usa_1702.index = df_usa_1702.index + 1
df_usa_1702["YearMonth"] = 201702
df_usa_1702 = df_usa_1702.reset_index()
.
.
.
df_usa_1801 = df[(df.Region == "USA") & (df.Year.isin([2017,2018])) & (df.Month.isin([1,2,3,4,5,6,7,8,9,10,11,12]))].groupby(["Title"],as_index=False).agg({"Streams": "sum"}).sort_values(["Streams"], ascending=[False]).head(10).reset_index(drop=True)
df_usa_1801.index = df_usa_1801.index + 1
df_usa_1801["YearMonth"] = 201801
df_usa_1801 = df_usa_1801.reset_index()
frames = [df_usa_1701, df_usa_1702, df_usa_1703, df_usa_1704, df_usa_1705, df_usa_1706, df_usa_1707, df_usa_1708, df_usa_1709, df_usa_1710, df_usa_1711, df_usa_1712, df_usa_1801]
df_usa_merged = pd.concat(frames)
df_usa_merged.to_csv('df_usa_merged.csv')

For the next part, you’ll need ggplot2, gganimate and readr (if you don’t already have)packages for the animation and reading the csv file that we’ve created earlier. Using the ggplot function set all the functionalities such as index, group and colors. All the remaining color, texture and size specifications are up to preference.

Next is setting an animation variable with chosen transition and state lengths. In order to use the animate function, we need a gif renderer and for that I the package called gifski_renderer. Finally, set the animate function with desired frame number, frame per second value and width and height values.

That’s it!

Full python code can be found here:

Thank you for reading!

--

--