IARPA Super Forecaster Challenge: IFP 840 — Can we predict number of Influenza detections between July 9–15, 2018 in Argentina?

3 min readMar 31, 2018

Flavius (Flavius Mihaies) and I have teamed up to participate in IARPA’s Super Forecasting Challenge that started this year’s March and will last for the next 6 months. The challenge aims to bring in individual’s analytical abilities by asking the questions from a wide specter of subjects from geography, economics, finance, and, of course, politics.

So far we have answered 21 question and we have another 21 to answer before the next Wednesday. Today I decided to undertake question 840:

id 0 IFP 840: How many positive influenza virus detections will FluNet record for Argentina between 9 July 2018 and 15 July 2018 (epidemiological week 28)?
Description: <a href = "http://www.who.int/influenza/gisrs_laboratory/flunet/en/"target="_blank">FluNet</a> is the World Health Organization's global web-based tool for influenza virological surveillance. To access relevant data at <a href = "http://apps.who.int/flumart/Default?ReportNo=12"target="_blank">http://apps.who.int/flumart/Default?ReportNo=12</a> make the following selections:
- Select by: 'Country, area or territory'; 
- Filter by: Country of interest, 
- Year from: '2018', Week from: '1', 
- Year to: '2018' Week to: '53', 
- Click 'Display report.' 
This question will be resolved by using the sum of values reported in the ‘Total number of influenza positive viruses’ (or .csv ‘ALL_INF’) column for the country of interest for all reporting weeks within the period of interest. Question will be resolved based on data available four weeks after the last day of the period of interest. If no data are available four weeks after close, question will be resolved as soon as data are released.
Starts: 2018-03-28T16:30:21.000Z, Ends: 2018-07-15T18:01:21.000Z
Options:
 (2595) Less than 60
 (2594) Between 60 and 170, inclusive
 (2593) More than 170 but less than 270
 (2592) Between 270 and 390, inclusive
 (2591) More than 390

Given the resource it is not difficult to extract the dates, weeks and number of positive detections of the infection:

import pandas as pd
from matplotlib import pyplot as pltdata = pd.read_csv(‘data/FluNetInteractiveReport.csv’) data[[‘Year’,’Week’,’ALL_INF’]].head()

Output of data[[‘Year’, ‘Week’, ‘ALL_INF”]].head()

Let’s visualize this dataset and see what we would get on one plot on a week-by-week basis:

plt.figure(figsize=(16,12))
for _ in data.Year.unique():
    plt.plot(data["Week"][data.Year == _], data["ALL_INF"][data.Year == _])
plt.legend(data.Year.unique())
plt.title("Number of positive influenza virus detections. FluNet record for Argentina \n Source:http://apps.who.int/flumart/Default?ReportNo=12")
plt.xlabel("Week of Year")
plt.ylabel("Number of Cases per Week")
plt.show()

Week-by-week number of positive influenza virus detections in Argentina between 1997 and 2018 (11 weeks)

Or in perspective of the recent 7 years:

plt.figure(figsize=(16,12))
for _ in data.Year.unique()[15:]:
    plt.plot(data["Week"][data.Year == _], data["ALL_INF"][data.Year == _])
plt.legend(data.Year.unique()[15:])
plt.title("Number of positive influenza virus detections. FluNet record for Argentina \n Source:http://apps.who.int/flumart/Default?ReportNo=12")
plt.xlabel("Week of Year")
plt.ylabel("Number of Cases per Week")
plt.show()

Seems like if the peak of detections is sooner in the year, the amplitude of the peak is higher. But let’s see if we can see any correlations.

Let’s replay the dataset and check if there is a correlation between the cumulative number of positive influenza virus detections in the first 11 weeks and the similar for just week 28.

pd.set_option("display.max_columns",22)
columns = ["Year", "11Weeks", "Week28"]
a = pd.DataFrame(columns=columns)
for _ in data.Year.unique():
    r = pd.DataFrame([[_,
                      data.ALL_INF[(data.Week>0)&(data.Week<12)&(data.Year==_)].sum(), 
                      data.ALL_INF[(data.Week==28)&(data.Year==_)].sum()]], columns = columns)
    a = a.append(r, ignore_index=True)
a.T

A more convenient view of a long vertical table.

Let’s see if there are any patterns for our prediction:

plt.figure(figsize = (12,8))
plt.scatter(a["11Weeks"],a["Week28"])
plt.title("Scatter Plot of Positive Influenza Detections During Week 28 vs. \n Sum of Positive Detections During Weeks 1 to 11")
plt.xlabel("Sum of Influenza Positive Detections During the First 11 Weeks of Each Year")
plt.ylabel("Influenza Positive Detections During the Week 28 of Each Year")
plt.show()

To be continued on the next post: here

IARPA Super Forecaster Challenge: IFP 840 — Can we predict number of Influenza detections between July 9–15, 2018 in Argentina?

Written by Baur Safi