Loglet Analysis-Revisiting Covid-19 Projections

Loglet-analysis (Decomposition of growth into S-shaped logistic components) is a better predictor for the covid-19 spread, as it takes into account the evolution of multiple waves.

Dennis Ganzaroli
Towards Data Science
9 min readJan 4, 2021

--

Fig 01: Loglet Analysis with KNIME & Jupyter-Notebook (image by author)

Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

In the last article, we showed how to make a forecast for the next 30 days using covid data from the Johns Hopkins Institute with KNIME, Jupyter, and Tableau. The projections were optimized for a logistic growth model. We will show that the decomposition of growth into S-shaped logistic components also known as Loglet analysis, is more accurate as it takes into account the evolution of multiple covid waves.

The term “Loglet”, coined at The Rockefeller University in 1994 joins “logistic” and “wavelet”. The Loglet analysis is interesting because it can handle multi-peak profiles which is a common challenge for curve fitting techniques.

The growth of replicating organisms, such as a virus, is basically characterized by three growth phases:

  1. Exponential growth
  2. Logistic growth
  3. Multi-logistic growth

Exponential growth

In the first phase, growth is exponential. In mathematical terminology, the growth rate of a population P(t) is proportional to the population. The growth rate at time t is defined as the derivative dP(t)/dt .

The exponential growth model is written as a differential equation is:

Fig 02: Exponential growth model (image by author)

This equation can be solved by introducing e (the base of the natural logarithm, approximately 2.71 . . .).
The familiar solution is:

Fig 03: Solution of exponential growth model (image by author)

where “a” is the growth rate constant and “b” is the initial population P(0). “a” is often expressed in percent. An “a” with a value of 0.02 is equivalent to the statement “the population was growing continuously at 2% per year” for example. Although many populations grow exponentially for a time, no finite system can sustain exponential growth indefinitely unless the parameters or limits of the system are changed.

Let’s try to visualize this growth in Jupyter-Notebooks:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
t = np.arange(1, 100)
a = 0.1
b = 1
y= b*np.exp(a*(t))

plt.plot(t,y)
plt.show()
Fig 04: Exponential growth (image by author)

2. Logistic growth
Since few systems sustain exponential growth over time, the exponential growth equation must be modified with a limit or carrying capacity that gives it the more realistic sigmoidal shape.
The most widely used modification of the exponential growth model is the logistic one. It was introduced by Verhulst in 1838, but popularized in mathematical biology by Lotka in the 1920s. The logistic equation begins with the exponential model but adds a “negative feedback” term that slows the growth rate of a population as the limit K is approached:

Fig 05: Logistic differential equation (image by author)

Notice that the feedback term (1−P (t))/K is close to 1 when P(t) << K and approaches zero as P(t) -> K. Thus, the growth rate begins exponentially but then decreases to zero as the population P(t) approaches the limit K, producing an S-shaped (sigmoidal) growth trajectory.

There are a number of closed form solutions to the logistic differential equation. We will use for our further analysis the Fisher-Pry equation:

Fig 06: Logistic growth function (image by author)

Equation above produces the familiar S-shaped curve. The three parameters are needed to fully specify the curve, “a”, “b”, and K. The growth rate parameter “a” specifies the “width” of the sigmoidal curve.

Let’s again to try to visualize this growth in Jupyter:

t = np.arange(0, 100)
K = 100
a = 0.1
b = 50

y = K/(1+np.exp(-a*(t-b)))

plt.plot(t,y)
plt.show()
Fig 07: Logistic growth (image by author)

Further it is often better to replace „α“ with a variable that specifies the time required for the curve to grow from 10% to 90% of the limit K, a period which we call characteristic duration Δt.

Fig 08: Time required for the curve to grow from 10% to 90% (image by author)

To achieve this, we have to perform the following algebraic transformations. First, we calculate the time needed to reach 90% of the total growth.

Fig 09: Calculating time to reach 90% of the growth (image by author)

Then the time to reach 10% of the total growth which we subtract from the value for 90%. The characteristic duration is related to „a“ by ln(81) / Δt.

Fig 10: The characteristic duration (image by author)

The parameter Δt is usually more useful than „a“ for the analysis of historical time-series data because the units are easier to appreciate. The three parameters K, ∆t, and „b“ define the parameterization of the logistic model used as the basic building block for our Loglet analysis.

Fig 11: Basic buidlign block of Logel Analysis (image by author)

3. Multi-logistic growth
In the last article, we showed that the logistic growth model can be a good approximation for the evolution of a covid19 wave. But since this spread is not limited to a single wave, we need an extended approach to describe the entire process.
Many growth processes consist of several subprocesses. First, let us model a system that experiences growth in two discrete phases. Then, we will extend this model to an arbitrary number of phases. In the so called Bi-Logistic model, growth is the sum of two “wavelets”, each of which is logistic:

Fig 12: Bi-Logistic model (image by author)

where

Fig 13: Bi-Logistic wavelets (image by author)

The implementation of the Bi-Logistic model requires 6 parameters and is visualized in Jupyter as follows:

t = np.arange(1, 100)

K1 = 100
dt1 = 40
b1 = 50

K2 = 400
dt2 = 20
b2 = 80

y = K1/(1+np.exp(-np.log2(81)/dt1*(t-b1))) + \
K2/(1+np.exp(-np.log2(81)/dt2*(t-b2)))

plt.plot(t,y)
plt.show()
Fig 14: Bi-Logistic growth (image by author)

The next figure shows the decompositions of the Bi-Logistic wave. The blue pulse is superimposed by the orange pulse that follows. In this way, the seemingly complex behavior is reduced to the sum of the two logistic wavelets.

y1 = K1/(1+np.exp(-np.log2(81)/dt1*(t-b1)))
y2 = K2/(1+np.exp(-np.log2(81)/dt2*(t-b2)))

plt.plot(t,y1)
plt.plot(t,y2)
plt.show()
Fig 15: Decompositions of the Bi-Logistic wave (image by author)

The next step is a generalization of the Bi-Logistic model to a Multi-logistic model, where growth is the sum of n simple logistics:

where

Fig 16: Multi-logistic function (image by author)

The following Jupyter code which is also available on my github-site, predicts for Switzerland the expected covid-19 cases for the next 30 days and requires the combination of 4 logistic wavelets. For other countries a different number of wavelets are necessary. We will come to this point below.
Fitting of the models was performed using the curve_fit function of the scipy package in Jupyter.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
path = 'https://raw.githubusercontent.com/deganza/Loglet-analysis-revisiting-covid19-projections/main/covid_daten_knime.csv'
df_country = pd.read_csv(path)
df_country_exp = pd.DataFrame()

df_country_list = df_country.groupby(['Country']).last().reset_index()
country_list = df_country_list['Country'].to_list()
df_country_period = df_country
zeilen = df_country_period['cases'].count()
# Select country or for evaluation of every country remark the variable country_list


#country_list =['Austria','Switzerland','Germany','United Kingdom','Spain']
#country_list =['United Kingdom']
#country_list =['Germany']
#country_list =['Spain']
country_list =['Switzerland']
#country_list =['Poland']
#country_list =['Turkey']
#country_list =['US']


df_country_exp = pd.DataFrame()

#logistic functions

def logistic_function1(x,b1,dt1,K1,b2,dt2,K2,b3,dt3,K3,b4,dt4,K4):
return K1/(1+np.exp(-np.log2(81)/dt1*(x-b1)))

def logistic_function2(x,b1,dt1,K1,b2,dt2,K2,b3,dt3,K3,b4,dt4,K4):
return K1/(1+np.exp(-np.log2(81)/dt1*(x-b1))) + K2/(1+np.exp(-np.log2(81)/dt2*(x-b2)))

def logistic_function3(x,b1,dt1,K1,b2,dt2,K2,b3,dt3,K3,b4,dt4,K4):
return K1/(1+np.exp(-np.log2(81)/dt1*(x-b1))) + K2/(1+np.exp(-np.log2(81)/dt2*(x-b2))) + K3/(1+np.exp(-np.log2(81)/dt3*(x-b3)))

def logistic_function4(x,b1,dt1,K1,b2,dt2,K2,b3,dt3,K3,b4,dt4,K4):
return K1/(1+np.exp(-np.log2(81)/dt1*(x-b1))) + K2/(1+np.exp(-np.log2(81)/dt2*(x-b2))) + K3/(1+np.exp(-np.log2(81)/dt3*(x-b3))) + K4/(1+np.exp(-np.log2(81)/dt4*(x-b4)))


for country in country_list:

df_country_modell = df_country_period[df_country['Country'] == country]

expdays = 30
wave = 0

datum = pd.date_range(start=df_country_modell['Date'].max(), periods=expdays)
datum = pd.date_range(start=df_country_modell['Date'].min(),end=datum.max())
datum = datum.strftime("%Y-%m-%d")

zeilen= df_country_modell['cases'].count()
x = np.arange(1,zeilen+1)
x_exp = np.arange(1, x.max() + expdays )
y = df_country_modell['cases']

y_min = y.min()
y = y - y_min


try:
popt, pcov = curve_fit(logistic_function4, x, y )
y_exp = (logistic_function4(x_exp, *popt))

df = pd.DataFrame({'day':x_exp, 'expected':y_exp,'datum':datum, 'Country' : country})
df_country_exp = df_country_exp.append(df)

wave=4
print(country,wave)

except RuntimeError:
try:
popt, pcov = curve_fit(logistic_function3, x, y )
y_exp = (logistic_function3(x_exp, *popt))

df = pd.DataFrame({'day':x_exp, 'expected':y_exp,'datum':datum, 'Country' : country})
df_country_exp = df_country_exp.append(df)

wave=3
print(country,wave)

except RuntimeError:
try:
popt, pcov = curve_fit(logistic_function2, x, y )
y_exp = (logistic_function2(x_exp, *popt))

df = pd.DataFrame({'day':x_exp, 'expected':y_exp,'datum':datum, 'Country' : country})
df_country_exp = df_country_exp.append(df)

wave=2
print(country,wave)

except RuntimeError:
try:
popt, pcov = curve_fit(logistic_function1, x, y )
y_exp = (logistic_function2(x_exp, *popt))

df = pd.DataFrame({'day':x_exp, 'expected':y_exp,'datum':datum, 'Country' : country})
df_country_exp = df_country_exp.append(df)

wave=1
print(country,wave)

except RuntimeError:
print(''.join(country) + ": Error - curve_fit Log failed")


df_country_exp.reset_index(drop=True, inplace=True)
print('finished')
print(zeilen)

For every country, the model fits several logistic pulses and calculates the parameters which optimally describe the spread of the virus.

How much training data is necessary?
We have now created a model that can approximate multiple consecutive and overlapping covid waves. But now the question rises: how many waves do we have to consider? This is a parameter that has not been calculated yet.

“Study the past if you would define the future.”
Confucius

In the last article, we decided to consider only the cases of the last 90 days to fit our logistic model. With our new Multi-logistic model, this limitation is no longer necessary because we can use data from the first day of the pandemic outbreak. But now we have a new parameter to choose for the number of wavelets. This is an open point an we decide to take the following strategy:
It’s now Jan 2021. For every country we start to fit the model with 4 wavelets. If the model doesn’t converge, we reduce it to 3 wavelets and so on until we found the optimal approximation.

Putting it all together
With the code above it’s already possible to make forecasts for every country, but we want to implement it in our existing KNIME-Workflow process for better data transformation and subsequent visualization in Tableau.
As you remember from the last article we use KNIME to load the covid19 data from the Johns Hopkins institute, transform it and calculate the forecasts with Jupyter. Finally we export the data in Tableau to visualize it.

Fig 17: KNIME Workflow: loads data, calculate forecasts and export it in Tableau (image by author)

The KNIME-workflow is available on my KNIME-hub site and the Tableau Dashboard is published with the actual data nearly every day on my Tableau-Public site here

Fig 18: Covid19 Tableau-Dashboard with projections for the next 30 days (image by author)

This extended Loglet-prediction method is much more flexible than the simple logistic analysis because it takes into account the evolution of continuous waves. New waves or mutations of the virus can thus be tracked and extrapolated more quickly.
And that is becoming more and more important as this virus has already mutated into a more contagious version in the United Kingdom and in South Africa.

Follow me on Medium, Linkedin or Twitter
and follow my Facebook Group “
Data Science with Yodime

Material for this project:
Jupyter-Code:
github
knime-workflow:
knime-hub
Tableau-Dashboard:
Tableau-Public

References
-
Covid 19-Projections with Knime, Jupyter and Tableau
- A primer on logistic growth and substitution: The mathematics of the Loglet Lab software
-
Scipy-documentation

As first published in KDnuggets.

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Dennis Ganzaroli
Dennis Ganzaroli

Written by Dennis Ganzaroli

Data Scientist with over 20 years of experience. Degree in Psychology and Computer Science. KNIME COTM 2021 and Winner of KNIME Best blog post 2020.

No responses yet