Web Traffic Time Series Forecasting — Forecast future traffic for Wikipedia pages

My first blog on Medium. Hope you all enjoy reading it.

Prerana

Published in

Analytics Vidhya

20 min readMar 4, 2021

This problem was a part of a Kaggle competition in 2017 which I solved a few days back.

So in this blog, I am going to take you through my solution.

Business Problem
Data Overview
Mapping Real-World Problem to ML Problem
Existing Solutions and Related Research Papers
First Cut Approach
Exploratory Data Analysis
Feature Development
Data Processing
Model Development
Result Comparison
Predictions
Deployment Video
Future Works
References

Business Problem

Explanation

Web traffic is the amount of data sent and received by visitors to a website. This is generally determined by the number of visitors on the page. Sites monitor the incoming and outgoing traffic to see pages of their site are popular and if there are any apparent trends, such as one specific page being viewed mostly by people in a particular country. Nowadays, web traffic forecasting is a major problem as this can cause setbacks to the workings of major websites.

Most of the people may have encountered a crashed site or very slow loading time for a website when there are a lot of people using it. This significantly affects user experience and they might leave a bad review for that site. So it is desirable that the owner of the site puts in place a traffic management plan to handle heavy traffic. This is where forecasting is needed.

Objectives

Identifying the nature of the phenomenon by establishing a pattern
Forecasting future values.
No low latency requirements but shouldn’t take days.

Data Overview

Data Source: https://www.kaggle.com/c/web-traffic-time-series-forecasting/data

You can obtain the data by simply downloading it from the above-mentioned link.

There is a total of 6 data files.

train_* — Training data which contains Pages as a column and the page hit values for a certain date range.
key_* — Key files contain the shortcode for each page.
Sample_submission _* — submission format This problem was a part of a Kaggle competition.

There were 2 stages for the competition. The first stage has data ranging from July, 1st, 2015 to December 31st, 2016. The second stage has training data till September 1st, 2017. train_* has columns Page and the remaining are dates corresponding to the stages. The Page column has the page names and the date columns have the page hit value of that date.

Mapping Real-World Problem to ML Problem

This is a time series forecasting problem where I will be forecasting page hits of future dates from past dates. As I am forecasting numerical values so this is a regression-type predictive modeling problem.

Deep Learning approaches like LSTM and CNN are used for modeling for this problem. The performance metric used is SMAPE.

Existing Solutions and Related Research Papers

Kaggle Solutions

First Solution

Link for the Solution

This Kaggle Solution is an extension of the following kernel.

Existing Kernel

Summary:

Loading and reading the train data into train_df
Exploratory Data Analysis
Average pageview for each language.
The language of each of the pages is extracted from the page name. A dictionary lang_sets is created to store the train data corresponding to each language as the key.
Then the average pageview value for each date corresponding to each language is stored in a dictionary called sum.
Plot the sum value for each language.
From the plot, I understood that the average pageview for the English Language is high.
The ACF and PACF are plotted for each language.
From the ACF plots, we find that there is a weekly trend for most of the languages. From the PACF plots, the AR value is calculated depending on the lags that are outside the confidence intervals.
Modeling — 2 models are prepared — ARIMA and LSTM
ARIMA- A model for each language is built considering the p,d,q values from the ACF and PACF plots. Model is trained on the average value for each language that is stored in the sums dictionary. It is mentioned that the predictions from this model will be used as input to the ensemble model.
LSTM- Vanilla LSTM of one layer with 8 neurons is used.
The optimizer used is RMSProp and the loss metric is MSE.
The first 549 days value is used to predict the next 549 days value. For example, data from 1.7.2015 to 30.12.2016 is used for forecasting the values from 2.7.2015 to 31.12.2016.

2nd Solution

Arturus/kaggle-web-traffic

TL;DR this is seq2seq model with some additions to utilize year-to-year and quarter-to-quarter seasonality in data…

github.com

This is the first place solution for this competition. I just went through the feature engineering and feature processing part of this document. Features generated are Country, agent, and site name from the page name.Also, year to year and quarter to quarter autocorrelation values are generated. Another feature called page popularity is generated as the median value of the pageviews for each page. The pageview values are converted to log1p values. The country, agent, and site name are one-hot encoded. All the features are normalized. The pageview values are normalized individually for each page.

Research Papers and Videos

Paper Name: Web Traffic Time Series Forecasting using ARIMA and LSTM RNN

Paper Link

In this paper, a new method for forecasting has been developed combining ARIMA and LSTM. Methodology steps discussed in this paper are the following :

Loading the time series data into an input vector.
Decomposing the vector into two parts by applying single level Discrete Wavelet Transform. This decomposition provides two data components, one is the Approximate (A) component and another is the Detailed(D) component.
ARIMA model is applied to the Detailed component and the LSTM model is applied on the Approximate component to get 2 separate forecasts.
Then the 2 forecasts are combined together using inverse DWT to get back the final forecast.

Wavelet Transform Video

I just wanted to have an idea on wavelet transforms so I went through the following video.

It’s a video explaining wavelets and wavelet transform. Wavelets allow us to have an idea about frequency magnitudes in a given time interval. In the case of the wavelet transform, a signal is broken down into a set of mutually orthogonal wavelet basis functions. There is a mother wavelet ψ(t). Smaller wavelets are derived from the parent wavelets

ψ (t) 1/ ((t )/a) a,b = ( √ a ) * ψ − b

Where a is the scale and b is the position.

First Cut Approach

In this section, I will be discussing the initial approach that I had planned to follow for solving this problem.

First I want to find the average pageviews for each date across all the pages. These average pageview values will be used in EDA rather than plotting for all pages.

EDA

Plotting the average pageviews vs. weekdays to get an idea of how the pageviews change with weekdays. If there is a pattern for different weekdays then it will be used as a feature.
Plotting the average pageviews vs days coloring it based on weekdays and weekends to understand if there is any difference that if the traffic is high in weekends than in weekdays or vice versa . If there is a difference then weekends will also be included as a feature.
Plotting the average pageviews vs. months to get an idea how the pageviews change with months .Similarly if there is any pattern it will be added as a feature.
Finding the language of each page and plotting the distribution of the languages as a histogram.
Finding the agent of each page and plotting the distribution of the agent type as a histogram.
Finding the access type of each page and plotting the distribution of the access type as a histogram.
Plotting the average pageviews vs agent and average pageviews vs. access to see if there is a pattern . If there is a pattern they will be included as features.
Finding the pageview percent for each page and plotting the histogram of average pageview percent for each language.
Plotting the acf plots of the top 5 pages based on their view percentage. To get the timestep value for LSTM.

Feature Development

Depending on the EDA , the features will be decided and only those which shows some pattern will be included.
Target Feature will be the pageview value for the dates .
All the features will be Standardized .

Model Development

LSTM Model

Bidirectional LSTM Model will be used to capture both past and future trends.
I want to use timestep as 7 as I have seen in the reference kernel that there is a weekly trend.
I want to train the model on a page that has a high view percentage .
Train data will have the first 70% dates white test data will have the next 30% dates.
I want to use SMAPE as the metric for evaluation .

After several experimentations , I have come to the final solution which I will discuss in the following sections in detail.

Exploratory Data Analysis

Understanding the dataset

Let me give you an idea of the dataset before I start explaining EDA.

Here as we can see , we have columns Page and several dates .So for each wiki page the pagehit values are recorded for the given dates. Also you can see some of the values are NAN. It is possible that those pages were non existent on that date.

I filled the missing values using forward linear interpolation.This will ensure that all the NAN values before a certain real value will be filled with 0 and the remaining NAN values will be filled using linear interpolation.

So now I have 2 datasets one with the raw values and the other with imputed values.

Page Name

If you observe the page name carefully , you will see it contains the language in which the page is written , the access type and the agent type . So from each page we can easily get these features.

I have also found out the active days for each page and the

view percentage = total number of views for a page/active number of days of that page

Visualizations

For visualizing the data I have generated the following features from the dates:

from datetime import datetime
date_list=[]
for i in train.columns[1:]:
    date_list.append(datetime.strptime(i, '%Y-%m-%d'))weekday=[]
for i in date_list:
    weekday.append(i.weekday())
weekday=pd.Series(weekday)
weekend=[]
for i in weekday:
    if i in range(5):
        weekend.append(0)
    else:
        weekend.append(1)
weekend=pd.Series(weekend)
month=[]
for i in date_list:
    month.append(i.month)
month=pd.Series(month)month_start=[]
month_start=pd.Series(date_list).dt.is_month_startmonth_end=[]
month_end=pd.Series(date_list).dt.is_month_endquarter_start=[]
quarter_start=pd.Series(date_list).dt.is_quarter_startquarter_end=[]
quarter_end=pd.Series(date_list).dt.is_quarter_endweek=[]
week=pd.Series(date_list).dt.weekquarter=[]
quarter=pd.Series(date_list).dt.quarterdays_in_month=[]
days_in_month =pd.Series(date_list).dt.days_in_monthyear=[]
year=pd.Series(date_list).dt.year

Weekday
Weekend
Month
month_start
month_end
quarter_start
quarter_end
quarter
week
days_in_month
year

Univariate Analysis

I have plotted pagehits vs. the above created features for both the raw and imputed data of the top 15 pages based on the view percentage.

Pagehits vs. Weekdays

This plot shows that on Sunday or Monday , the average pageviews increase.
There are differences between each weekday.
In most cases there is a gradual drop in the pagehit values from Monday to Saturday and then there is an increase on Sunday which continues to Monday.
So I will be including this weekday as a feature and also another feature as is_sunday_or_monday as their mean seem to be higher than the other days.

Pagehits vs. Weekends

There is not much difference in boxplots of weekdays and weekends .
IQR is approximately same and the median is higher for weekends in some cases.
But there are less number of days in weekends so the IQR and median should have been less than weekdays.
This suggests that the traffic is higher on weekends as compared to weekdays

Pagehits vs. Month

The boxplots of each month are somewhat different from one another.
The IQR is much smaller for much smaller for months in first half of the year and quite high in the second half.
In some cases the IQR for August is much high
So month number and year half can be used as a feature .

Pagehits vs. week also shows a similar trend as the above plot.

Pagehits vs. Month_start

There are very few days that fall on the first day of the month , but we see that the boxplot mean and IQR is aprroximately same for both the month starting days and other ones.
This suggests that the traffic on month start is quite high as compared to other days .
So it is important to include this as a feature .

Pagehits vs. Month_end

I see a similar trend as the month_start feature
The boxplot for month_end days is in fact has a higher IQR and similar median value as other days
This tells us that the days connecting the months have significantly higher traffic as compared to other days.

As quarter_start and quarter_end are part of month_start and month_end so, pagehits vs. quarter_start and pagehits vs. quarter_end have similar characteristics like month_start and month_end features.

Pagehits vs. Days_in_month

In some cases February is having much higher pagehit values as compared to other months.
There are more number of months with 30 and 31 days .
Yet the mean of February is much higher than the other months

Pagehits vs. Year

In most of the cases there is an increase in pagehit values in 2015 and decrease in the following years
Again there is just a downward trend from 2015 which continues to the following years
So we can conclude that with increase in year the pagehit values decrease in most cases which make year an important feature.

Pagehits vs. Quarter

For different quarters the pagehit values are different.
This along with quarter_start and quarter_end feature can give me accurate idea of traffic.

Bivariate Analysis

Now comes the bivariate analysis in which I will find the relationship between pagehits and multiple features in a single plot.

Relationship between weekdays , months and pagehits

months and weekdays vs. pagehits for a page

This plot shows how the traffic changes on days of each month.
From this I understand the IQR of traffic values is less in 1st half of the year .
IQR gradually increases till mid year.
Then it again falls till the end of the year
Also there is a difference in the first half and second of the year.
So I can create a feature as year_half with values as 1 or 2 for 1st and 2nd half respectively.
I will also create another feature is_august because the IQR for August is highest for most pages.

Relationship between week , year and pagehits

It is observed in most cases the weeks in 2015 has higher pagehit values.
Also there is a downward trend in pagehit values for weeks in following years.

From the above plots , I wish to include the following features for training.They are weekday, is_sunday_or_monday, month, is_august, year_half, quarter, month, quarter_start, quarter_end, month_start, month_end, days_in_month and week.

Now coming to the visualizations of features of the page name like language , access type and agent type. In the following section I will be plotting the page features of top 15 pages.

Language plot

Plots for each language of top 15 pages.

We see that only for language en , the pagehits values are quite high.
The rest of the plots are similar to one another.

Access Plots

For access type 1 , the pagehit values have a peak of 20000. Mostly it ranges within 10000.
For access type 0 , the pagehit values have multiple peak values above 10000.
Access type 2 reaches a peak value of around 110000 although most of it ranges within 5000.
Pagehit values vary with access types.

Agent Plots

As we can see , there is no page with agent type 1 .This shows that all the pages with high pagehit values have agent type 0.

So I decided to do the Agent plot for last 15 pages as well.

Here we see there is no agent 0 plot , because all the pages with low view percentage has agent type 1.

From the above plots of page features, I conclude that I will be including agent and access type in the feature set.

Autocorrelation plots

Autocorrelation plots is a graphical representation of linear relationship between lagged values of a time series.I have plotted the autocorrelation plots of top 15 pages .

Here I will be showing the plots for different languages among the top 15 pages.

If you observe carefully you will see that for all the languages , the correlation increases at intervals of 7 .

Thus we can conclude that our data has a weekly trend.

This was the entire EDA solution for this problem. Now we will proceeding to Feature Development.

Feature Development

I have developed 2 types of feature sets . One that is common for all the pages also known as Global Features. The other feature set is created based on the page so it is also called Page-Specific Features.

Global Feature Set

This feature set is generated based on the date range for which the pagehits have to be forecasted. It contains the following features :

weekday, is_sunday_or_monday, month, is_august, year_half, quarter, month, quarter_start, quarter_end, month_start, month_end, days_in_month and week

Page-Specific Feature Set

For this , I planned to create a dataset for each combination of access and agent type.But I wanted to be more detailed . So I split it up into 2 parts .

75th Percentile Group: For this I have calculated the 75th percentile values of the pagehits for each agent and access type .

75th percentile values of pagehits for each agent and access type

25th Percentile Group: For this I have calculated the 25th percentile values of the pagehits for each agent and access type .

25th percentile values of pagehits for each agent and access type

These pagehit values will act as the target value . The agent and access type will be clubbed together with the global features. A total of 8 datasets will be created — 4 for 75th percentile and 4 for 25th percentile.

Let me take you through one of them in detail:

This dataset is for access and agent type 0

Page Specific Feat for Access type 0 and Agent type 0

Now both the global feature set and page specific feature set will club together to give the following total feature set.

These are the target values for set 1.

So total 8 feature sets are generated.Each of the feature set will be used to train a model.

Now that our features are generated let’s proceed to Data Processing.

Data Processing

First I have splitted the feature sets into train and test data , where 1st 70% of the data is used for training and the rest 30 % for testing. I did not shuffle the data while splitting as it is temporal in nature.

As we have seen above that our feature set has both numerical and categorical values , so there is a need to encode these categorical values into numerical ones.

Categorical Value Encoding

For this, I have used Label Encoder .Label Encoding is a simple approach where we convert each of the categories into a number.

from sklearn.preprocessing import LabelEncoder
le1=LabelEncoder()
X_train['month_start']=le1.fit_transform(X_train['month_start'])
X_test['month_start']=le1.transform(X_test['month_start'])le2=LabelEncoder()
X_train['month_end']=le2.fit_transform(X_train['month_end'])
X_test['month_end']=le2.transform(X_test['month_end'])le3=LabelEncoder()
X_train['quarter_start']=le3.fit_transform(X_train['quarter_start'])
X_test['quarter_start']=le3.transform(X_test['quarter_start'])le4=LabelEncoder()
X_train['quarter_end']=le4.fit_transform(X_train['quarter_end'])
X_test['quarter_end']=le4.transform(X_test['quarter_end'])

Log Transformation of Data

I have transformed the data using log(1+x) function to deal with 0 values.

X_train=np.log1p(X_train)
X_test=np.log1p(X_test)
y_train=np.log1p(y_train)

Timestep Creation

As we have seen in Autocorrelation plots , that mostly all the pages have a weekly correlation of pagehits , we need to incorporate this lag feature in our dataset.For this I have used timestep to reshape my data.

Timestep is the amount of memory that I should take into account for predicting the output.

For Example: We have observed a correlation of 7 in the above plots. So I need to take into account the past 7 days data to predict the output. Here the timestep comes into play. I have reshaped my data as [n_samples,features ,timestep]

where n_sample is the length of the data,features is the total number of features and timestep is the lag value which is 7.

def create_dataset(X,y,timestep=1):
    Xs,ys=[],[]
    for i in range(len(X)-timestep):
        v=X[i:i+timestep]
        Xs.append(v)
        ys.append(y[i+timestep])
    return np.array(Xs),np.array(ys)
train_x,train_y=create_dataset(X_train.values,y_train.values,7)
test_x,test_y=create_dataset(X_test.values,y_test.values,7)

With this my entire data processing is complete . Now we will move to modeling.

Model development

Baseline Model

This is the mean model or the basic model for this problem.Here the mean value of the pagehits of the train data is used as the predicted value.

mean_forcast = np.mean(eda_train_imp.iloc[:,1:train.shape[1]].values.flatten())
y_pred=[[mean_forcast]]*X_test.shape[0]

The baseline model resulted in a Kaggle score of 141.17.

Our target is to get a model with a better score than the baseline model.

LSTM Models

LSTM is a type of RNN that is designed to handle long-term dependency.Here we have seen that pagehits were autocorrelated at intervals of 7 . So to take into account that dependency I have used LSTM .

Pagehit value of a particular day can be affected by both days in the past as well as days in the future.In order to take care of that I have used bidirectional LSTMs.

I have developed 2 types of LSTM models:

Stacked LSTM with Dropouts

I have developed 2–3 Bidirectional LSTM layers with dropouts for each feature set. So there are total of 8 models each trained till 100 epochs. Following is the code snippet for the model.

from keras import Sequential
import keras
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import Bidirectional
from keras.layers import Dense
from keras.optimizers import Adam
model1=Sequential()
model1.add(Bidirectional(
            LSTM(units=256,
                 activation='relu',
                 input_shape=(train_x.shape[0],train_x.shape[1],),
                 return_sequences=True
                )
        )
         )
         
model1.add(Dropout(0.5))
model1.add(Bidirectional(
            LSTM(units=32,
                 activation='relu',
                 
                )
        )
         )
         

model1.add(Dense(1))
opt=Adam(learning_rate=0.001)
def customLoss(y_true, y_pred):
    epsilon = 0.1
    summ = K.maximum(K.abs(y_true) + K.abs(y_pred) + epsilon, 0.5 + epsilon)
    smape = K.abs(y_pred - y_true) / summ * 2.0
    return smapemodel1.compile(loss=customLoss,optimizer=opt)
history=model1.fit(train_x,train_y,epochs=100,batch_size=64,verbose=1,validation_split=0.2,shuffle=False,callbacks=callbacks_list)

This model resulted in a Kaggle Score of 83.70.

Stacked LSTM without Dropouts

Just for experimentation I increased the number of layers and removed dropouts from the model .Here is the code snippet.

from keras import Sequential
import keras
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import Bidirectional
from keras.layers import Dense
from keras.optimizers import Adam
model1=Sequential()
model1.add(Bidirectional(
            LSTM(units=256,
                 activation='relu',
                 input_shape=(train_x.shape[0],train_x.shape[1],),
                 return_sequences=True
                )
        )
         )model1.add(Bidirectional(
            LSTM(units=64,
                 activation='relu',
                 return_sequences=True
                )
        )
         )model1.add(Bidirectional(
            LSTM(units=32,
                 activation='relu',
                 
                )
        )
         )
         

model1.add(Dense(1))
opt=Adam(learning_rate=0.001)
def customLoss(y_true, y_pred):
    epsilon = 0.1
    summ = K.maximum(K.abs(y_true) + K.abs(y_pred) + epsilon, 0.5 + epsilon)
    smape = K.abs(y_pred - y_true) / summ * 2.0
    return smapemodel1.compile(loss=customLoss,optimizer=opt)
history=model1.fit(train_x,train_y,epochs=50,batch_size=64,verbose=1,validation_split=0.2,shuffle=False,callbacks=callbacks_list)

3 layers of Bidirectional LSTM cells with different number of neurons followed by a Dense layer.

However this model resulted in a Kaggle score of 84.87 .

CNN Models

Some CNN models provide excellent results on time series forecasting.So I thought of developing a CNN model as well.Generally we use CNN on image data that is tensors, but here we will be using it on sequence of data .So that’s why I have used conv1D instead of conv2D for developing the models.

from keras import Sequential
import keras
from keras.layers import Conv1D,BatchNormalization,MaxPooling1D
from keras.layers import Dropout
from keras.layers import Dense,Flatten
from keras.optimizers import Adam
model1=Sequential()
model1.add(Conv1D(128,(3),padding='same',activation='relu',input_shape=(train_x.shape[1],train_x.shape[2],)))
model1.add(Conv1D(32,(3),padding='same',activation='relu'))
model1.add(BatchNormalization())
model1.add(Conv1D(8,(3),padding='same',activation='relu'))model1.add(MaxPooling1D((2)))
model1.add(Flatten())
model1.add(Dense(64,activation='relu'))
model1.add(Dropout(0.8))
model1.add(Dense(1))
def customLoss(y_true, y_pred):
    epsilon = 0.1
    summ = K.maximum(K.abs(y_true) + K.abs(y_pred) + epsilon, 0.5 + epsilon)
    smape = K.abs(y_pred - y_true) / summ * 2.0
    return smapeopt=Adam(learning_rate=0.001)
model1.compile(loss=customLoss,optimizer=opt)history=model1.fit(train_x,train_y,epochs=100,batch_size=64,verbose=1,validation_split=0.2,shuffle=False,callbacks=callbacks_list)

All the 8 models are variations of the above model.They are trained for 100 epochs.

This model achieves a Kaggle score of about 84.15 which is similar to the previous models.

CNN and LSTM Models

The CNN model does not have any memory like LSTM . So when we are giving input to a CNN model , it connects all the inputs to a single one which is not desirable. I want each of the sequences to be individually fed to a CNN model so that the underlying patterns are separate.For this I have wrapped the CNN model in TimeDistributed layer which treats each of the inputs separately .

from keras import Sequential
import keras
from keras.layers import Conv1D,BatchNormalization,MaxPooling1D,LSTM
from keras.layers import Dropout
from keras.layers import Dense,Flatten,TimeDistributed
from keras.optimizers import Adam
model1=Sequential()
model1.add(TimeDistributed(Conv1D(128,(3),padding='same',activation='relu',input_shape=(None,n_length,train_x.shape[3]))))
model1.add(TimeDistributed(Conv1D(32,(3),padding='same',activation='relu')))
model1.add(TimeDistributed(BatchNormalization()))
model1.add(TimeDistributed(Conv1D(8,(3),padding='same',activation='relu')))#model1.add(TimeDistributed(MaxPooling1D((2))))
model1.add(TimeDistributed(Flatten()))

Now these separate input features generated from CNN is sent to LSTM as a sequence to process.

model1.add(LSTM(128,activation='relu',return_sequences=True))
model1.add(LSTM(64,activation='relu'))model1.add(Dense(64))
model1.add(Dropout(0.8))
model1.add(Dense(train_y.shape[1],activation='relu'))

These models are also trained for epoch 100 .

This model achieved a Kaggle Score of 84.46

Result Comparison

The LSTM model with dropouts has the best score. So I have considered that as my final model for deployment. The entire code is available at the following Github link.

Predictions

Aa I have already mentioned that , for each type of model , 8 different models are trained for 8 different feature sets.Now which model to use for prediction?

For this I have generated the median viewperc values for each access and agent combination.

Whenever I get a page to forecast, I retrieve its viewperc ,access and agent value. If viewperc > median viewperc for that access and agent combination , I have used the model corresponding to 75th percentile feature set for that agent and access , otherwise , I have used the model corresponding to 25th percentile feature set for that agent and access combination.

if access_index==0 and agent_index==0:
        if viewperc>=view1:
            y_pred_lstm=model1.predict(test_x)
        else:
            y_pred_lstm=model5.predict(test_x)elif access_index==1 and agent_index==0:
        if viewperc>=view2:
            y_pred_lstm=model2.predict(test_x)
        else:
            y_pred_lstm=model6.predict(test_x)elif access_index==2 and agent_index==0:
        if viewperc>=view3:
            y_pred_lstm=model3.predict(test_x)
        else:
            y_pred_lstm=model7.predict(test_x)
    elif access_index==0 and agent_index==1:
        if viewperc>=view4:
            y_pred_lstm=model4.predict(test_x)
        else:
            y_pred_lstm=model8.predict(test_x)

Deployment Video

Following is the video of deployment.

Deployment video.wmv

Edit description

drive.google.com

Future Works

We can use different predictors like ARIMA and Wavelet based models to see if there is any change in the scores.
Here we have worked on only Wiki Pages , we can work it out for other site pages as well .
The training data spanned from 2015 to 2017 ,we can build a model that will take live traffic data into consideration and retrain the model in real time to forecast future traffic.
In case of web traffic forecasting , we should also take into account the socio-economic factors for better results.