Animate the timelapse of 2023
On 8 Jan 2024, we released a Python library called AnimatedWordCloud
PyPI: https://pypi.org/project/AnimatedWordCloudTimelapse/
GitHub: https://github.com/konbraphat51/AnimatedWordCloud
As the name says, this library makes an animated word cloud from time-lapse data.
Here I will show you how to make an animation of 2023, using data of Guardian article tags.
If you just want to have the code now, run this notebook at Colab: https://gist.github.com/konbraphat51/2f22568479061d5dc5b3f3ef170dd352
1. Fetching Guardian Data
We are using the Guardian article dataset by kiet21042003 in Kaggle Datasets.
Just download from the link above, or you can download by Kaggle API.
import os
os.environ['KAGGLE_USERNAME'] = "" #@param {type:"string"}
os.environ['KAGGLE_KEY'] = "" #@param {type:"string"}
!kaggle datasets download -d kiet21042003/news-articles-of-the-guardian-112015-23112023
import shutil
shutil.unpack_archive("/content/news-articles-of-the-guardian-112015-23112023.zip", '/content')
2. Clean Data
We want to use the Time
column of the data frame, but it seems that the last timezone part disabling pandas.to_datetime()
.
Thus we have to cut it before the datetime
conversion.
def rid_timezone(x):
try:
return x[:-5]
except:
return "Thu 1 Jan 2015 23.11"
df["DateTime"] = df['Time'].apply(rid_timezone)
df["DateTime"] = pd.to_datetime(df['DateTime'], format='%a %d %b %Y %H.%M')
Now we explicit the 2023 data
from datetime import datetime
df_target = df[(df["DateTime"] >= datetime(2023, 1, 1)) & (df["DateTime"] < datetime(2024, 1, 1))]
3. Prepare Timelapse Data
The timelapse data need to be like:
[(time_name, {word: weight})]
import ast
stopwords = {"news", "features", "The Observer", "reviews"}
timelapse = []
for month in range(1, 12):
word_vector = {} # word -> weight
if month < 12:
df_month = df_target[(df_target["DateTime"] >= datetime(2023, month, 1)) & (df_target["DateTime"] < datetime(2023, month + 1, 1))]
else:
df_month = df_target[(df_target["DateTime"] >= datetime(2023, 12, 1)) & (df_target["DateTime"] < datetime(2024, 1, 1))]
for tags_str in df_month["Tags"]:
#the raw data is all string, so convert to Python list
tags = ast.literal_eval(tags_str)
for tag in tags:
if tag in stopwords: continue
# count each tags
word_vector[tag] = word_vector.get(tag, 0) + 1
timelapse.append(
# to tuple
(
str(month), #time name
word_vector #word dictionary
)
)
Now we have timelapse data in timelapse
4. Animate
Install AnimatedWordCloudTimelapse
from PyPI.
pip install AnimatedWordCloudTimelapse
and call AnimatedWordCloud.animate()
from AnimatedWordCloud import Config, animate
config = Config(
output_path="/content", #for colab
min_font_size=15,
image_width=1000,
image_height=1000,
)
animate(timelapse, config)
Then the gif animation is made.
If you want to display the animation to your notebook, you can write as this:
from IPython.display import display, Image
with open('/content/output.gif','rb') as f:
display(Image(data=f.read(), format='png'))
Please give a star to our library!