Analyzing the Impact of Covid-19 per IAB Category

Ishan Shrivastava
GumGum Tech Blog
Published in
8 min readMay 5, 2020

In this blog we will look at the impact of Covid-19 based on GumGum’s publisher network from January 10th 2020 to April 8th 2020. I utilized GumGums AI capabilities to classify all the web pages into different IAB categories to see how different IAB categories were impacted by Covid-19. Links to the interactive versions for each of the graphs in this blog are also present.

Data Collection

I queried GumGum’s database to collect the processed data from the traffic of English Webpages seen by GumGum over the course of January, February, March and April. Due to the high data volume of GumGums AI databases, I utilized Databricks to run a PySpark job to collect, aggregate and deduplicate across time. The deduplication entails collecting only the first occurrence of a webpage and removing all the other duplicates. The processed data collected had the text for the different web pages and their IAB classification along with the date when this web page was first processed.

Covid-19 Related Articles as a Time Series

As a part of this analysis I only considered the webpages that are related to Covid-19. The graph below shows the number of unique webpages per day that are related to Covid-19 from January 10th to April 8th.

The graph above clearly shows how the number of Covid-19 related articles increased with time. Covid-19 related articles started popping up with a more or less constant frequency of approximately 3k webpages per day around late January and continued till late February.

This changed right in the last week for February and we saw a peak of slightly more than 10k Covid-19 related articles. In order to find out what could have contributed to this peak, I looked at the events that transpired from February 24th to February 29th.

These events might have triggered the increase in webpages talking about Covid-19 and other related subjects. The following week also saw a similar frequency. This was the time when many prominent states in the United States of America declared a State of Emergency. California on March 4th and New York on March 7th.

The frequency of Covid-19 related articles drastically increased from March 9th. This was the time when many states in the US started declaring State of Emergency. Right after WHO declared Coronavirus a Pandemic on March 11th, we saw 35k Covid-19 related articles on March 12th. Finally by the end of this week, US had declared National Emergency on March 13th. All of these events might have resulted in such high volume of webpages talking about Covid-19.

Beyond this we saw 3 major peaks per week with all of them crossing the 40k mark. The first peak occurred on March 17th, the day when Coronavirus had spread in all 50 states. The second peak occurred right after March 24th which was the deadliest day for US so far with 160 deaths due to Coronavirus.

We finally touched the 50k mark on April 1st, when 869 US coronavirus deaths were reported in a single day. This was again the deadliest day for the US for far. Right after this, we crossed the 50k mark on April 2nd, the day when President Donald Trump had invoked the Defense Production Act against 3M for face masks.

Impact of Covid-19 on Different IAB Categories

As seen from Figure 1, it is evident that there is a sharp increase in the webpages talking about Covid-19. To understand what this means at an IAB category level, let’s look at the proportions of different IAB categories per day for Covid and Non Covid articles.

It is important not to focus on the patterns that we observe for Covid-19 related articles early in January mainly because of their low volume. As Figure 1 shows, the Covid-19 related articles mainly started popping up from last week of January.

Figure 2 (above) shows the percentage of Covid-19 related articles observed for different IAB categories giving an insight into how the daily traffic is made up of these different IAB categories. Similarly, Figure 3(below) shows the same for Non Covid-19 related articles.

Couple of things that stand out from Figure 2 (above)and Figure 3 (below) are:

  1. Sports pages constantly cover the maximum proportion of the Covid-19 related web pages.
  2. Food & Drink and Arts & Entertainment see a gradual increase starting March for both Covid-19 and Non Covid-19 related articles.
  3. Web pages about Travel seem to pop up early in the conversation about Covid-19 but from March their frequency is slowed down. On the other hand, the proportion of Travel IAB pages among the Non Covid-19 pages has been relatively constant. This indicates that the increase in Travel IAB could have been because of the Travel Bans caused due to Covid-19.

To look deeper into individual IAB categories, I looked at the weekly distribution of the proportion for different IAB categories. This helped in observing how the range of proportions changes weekly. To achieve this, I binned or bucketed the proportions of each IAB category into weeks and looked at the weekly box plots of these proportions. Let’s look at these plots for different IAB categories in the section below.

Sports IAB Category

From Figure 2 and Figure 3, it’s evident that Sports IAB category is consistently the top most covered category in both Covid-19 and Non Covid-19 related articles. Let’s look at the graph below, that shows the weekly distribution of the proportions of Sports IAB category among the Covid-19 and Non Covid-19 related articles.

As seen in Figure 1, starting from March, 2020 Covid-19 related articles started increasing. Figure 4 (above) shows how Sports covered around 60–70% of that daily traffic. What’s interesting to see is how starting mid February, the proportion of Sports IAB category in Covid-19 related articles started increasing till mid March. Around the same period, many Sporting events were either postponed or cancelled because of Covid-19 which likely explains this increase.

On the other hand, the proportion of Sports category among the Non Covid-19 related articles has remained relatively constant for the same period of time (Mid February — Mid March). It only starts to go down from Mid march which likely shows how cancelation of Sporting events has slowed down the interest in Sports related articles.

Travel IAB Category

Figure 2, shows the how Travel IAB category was amongst first categories other than Sports to get impacted because of Covid-19. Figure 3, shows how this category is relatively small in terms of its proportion among the Non Covid-19 related articles. Let’s look at the graph below, that shows the weekly distribution of the proportions of Travel IAB category among the Covid-19 and Non Covid-19 related articles.

Figure 5 shows the increased proportions of Travel category among the Covid-19 related articles early in the year. Talks about different travel bans started in late January and the eventual execution of different travel bans happened in February which can be backed by our data as well. Over the time, starting march the proportion of Travel category among the Covid-19 related articles decreased. On the other hand, Travel category’s proportion among Non Covid-19 related articles has been really small, mostly around 0.8% to 1.5%.

Food & Drink IAB Category

Figure 2 shows that the proportions of Food & Drink category among the Covid-19 related articles started increasing from March. Figure 6 shows us how the daily proportions for this category moves above 3% among the Covid-19 related articles. This increase might have been because of the implementation of “Stay at Home” orders around the same time. This could have increased people’s interest in reading about different Food Recipes or about dining options etc.

Whats even more interesting is that the proportion of Food & Drink category among the Non Covid-19 related articles also started increasing (Figure 3 and 6). The Box Plots for Non Covid-19 in the figure below shows a significant jump post mid-March. This helps us in considering the behavioral effect of “Stay at Home” orders because of Covid-19.

Style & Fashion IAB Category

It is interesting to see that our data shows spike for this category in February when Milan fashion week was cancelled (Figure2). Figure 7 shows the increase after mid February in the proportions of Style & Fashion category specifically among the Covid-19 related articles.

Arts & Entertainment IAB Category

For this category, Figure 8 shows a gradual increase in its proportion among the Covid-19 related articles. The increase is more drastic when we look at the proportions among the Non Covid-19 related articles. Therefore we can probably say that due to “Stay at Home” orders, it is likely that people are more interested in Arts & Entertainment.

Conclusion

In this blog we saw how the web traffic particularly in the Ad Tech space reacted to different events that transpired because of Covid-19 from January 10th 2020 to April 8th 2020. It was interesting to see how the Travel bans and cancellation of Sporting and Fashion Events is captured in the web traffic by following their impact on Travel, Sports and Style & Fashion IAB Categories respectively. Looking at the proportion of pages for Food & Drink and Arts & Entertainment IAB categories over time gives us a sneak peak into the behavioral changes that could have been caused by the “Stay at Home” orders due to Covid-19 pandemic.

About Me: Graduated with a Masters in Computer Science from ASU. I am a NLP Scientist at GumGum. I am interested in applying Machine Learning/Deep Learning to provide some structure to the unstructured data that surrounds us.

We’re always looking for new talent! View jobs.

Follow us: Facebook | Twitter | | Linkedin | Instagram

--

--