Bayesian Neural Networks to Make Sense of Diabetes Uncertainty

Dan Korelitz
The Startup
Published in
6 min readNov 13, 2019
Fig 1. Volatile Blood Sugars

In my former career writing trading algorithms, I would look at this graph and see the potential for making money. Now, as a parent of a young child with type 1 diabetes I look at this chart and become nauseated. This graph shows my son’s blood sugar over two days last week — the blood sugar of a young child with type 1 diabetes can be a wild ride!

For the past few years I have researched ways to reduce the burden of diabetes in his life and the fear my family lives with every day(first solution to help my family). My research has made use of the latest and greatest in the who’s who of ML models, e.g. Recurrent Neural Networks, Seq2Seq, Seq2Seq with Attention, Convnets, etc. All these efforts have been toward making the “perfect” long-term blood sugar prediction.

It wasn’t until recently when I became comfortable with the fact it is going be very difficult for these models to come anywhere near perfection, especially in moments an immediate blood glucose(bg)intervention is needed. Most importantly, I don’t think these types of models are going to address the issues caregivers of kids with type 1 face on a daily basis.

Most ML models can make sense of the physiological processes represented in diabetes data but it is very difficult to capture the human behaviors that affect bg. Especially during moments of heightened anxiety, human behaviors play a very significant role in diabetes management. For example, a caregiver might decide to give unannounced food to prevent lows, make a mistake counting carbs or in a panic give the child a large amount of sugar — decisions made under stress create a whole new dimension for the modeler to consider. With these types of interventions, it can be very difficult from both a researcher and user’s perspective to feel comfortable with giving point estimate predictions.

I have looked into ways to model this uncertainty in bg as result of imperfect measurements and human behaviors. This desired outcome is a very good candidate for a Bayesian model. Bayesian modeling offers a systematic framework for reasoning about model uncertainty. So, instead of just learning point estimates, we’re going to learn a distribution over variables that are consistent with the observed data before proceeding to make an inference about the predictive distribution. This distribution over the parameters (posterior distribution)is an application of Bayes rule and represented by:

D=(y,X), y=dependent variables, X=independent variables, θ=parameters

We can then use this posterior distribution to make predictions. These predictions (posterior predictive distribution) can be written as:

y*=unseen y, x*=unseen x, p(θ|D)=posterior

Fortunately, we have tools that make this type of computation possible, as the denominator of the posterior is intractable. Using PyTorch and Pyro we create three main components to get our model to run. The first component is the model or function that will do the bg prediction. We define our model(3 layer neural network) as we would any PyTorch neural network.

class RegressionModel(nn.Module):def __init__(self, num_feature,num_hidden,num_output,dropout=0.0): super().__init__()
self.linear_1 = nn.Linear(num_feature, num_hidden)
self.linear_2 = nn.Linear(num_hidden, num_hidden)
self.linear_out = nn.Linear(num_hidden, num_output)
self.drop_layer = nn.Dropout(p=dropout)
def forward(self, x):
x=self.linear_1(x)
x=F.relu(self.drop_layer(x))
x=self.linear_2(x)
x=F.relu(self.drop_layer(x))
out=self.linear_out(x)return out
num_hidden=32 #hidden units
num_feature=5 #num input dimensions
num_rank=1 #regression so output =1
regression_model = RegressionModel(num_feature,num_hidden,num_rank)

The relatively simplistic model is actually quite effective(RMSE ~20 )in estimating blood sugars up to 30 minutes considering the volatility in the dependent variable. However, I must stress that very careful consideration must be made as to how to transform the inputs(bg,carbs,insulin) into an effective feature set. I transformed the inputs into 5 features including 1) smoothed bg, 2) moving average of the smoothed bg, 3) momentum of smoothed bg, 4) absorbed carbs and 5) absorbed insulin. The contributors to Loop and OpenAPS have done an amazing job in modeling carbs and insulin and with only minor tweaking to a few parameters I have a carb/insulin absorption process that I am confident in.

The next step is to define the model and guide in Pyro. The Pyro model will take the neural network defined above and re-purpose it for a Bayesian setting.

def model(x_data, y_data):# weight and bias priors
w1_prior=Normal(loc=torch.zeros_like(regression_model.linear_1.weight),scale=torch.ones_like(regression_model.linear_1.weight)).independent(2)
b1_prior=Normal(loc=torch.zeros_like(regression_model.linear_1.bias,scale=torch.ones_like(regression_model.linear_1.bias)).independent(1)
w2_prior=Normal(loc=torch.zeros_like(regression_model.linear_2.weight),scale=torch.ones_like(regression_model.linear_2.weight)).independent(2)
b2_prior=Normal(loc=torch.zeros_like(regression_model.linear_2.bias),scale=torch.ones_like(regression_model.linear_2.bias)).independent(1)wout_prior=Normal(loc=torch.zeros_like(regression_model.linear_out.weight),scale=torch.ones_like(regression_model.linear_out.weight)).independent(2)bout_prior=Normal(loc=torch.zeros_like(regression_model.linear_out.bias),scale=torch.ones_like(regression_model.linear_out.bias)).independent(1)scale = torch.ones(1, 1)scale = pyro.sample("sigma", Uniform(0., 10.))priors = {'linear_1.weight':w1_prior,'linear_1.bias':b1_prior,'linear_2.weight':w2_prior,'linear_2.bias':b2_prior,'linear_out.weight':wout_prior,'linear_out.bias':bout_prior}# lift module parameters from neural net
lifted_module = pyro.random_module("module", regression_model, priors)
lifted_reg_model = lifted_module()with pyro.plate("map", len(x_data)):#run forward on regression_model
prediction_mean = lifted_reg_model(x_data)
prediction_mean=prediction_mean.squeeze(-1)
# condition on the observed data
pyro.sample("obs",Normal(prediction_mean, scale),obs=y_data)
return prediction_mean

The guide is defined in much the same way as the model with learnable parameters corresponding to each sample() in the mode and an identical function signature. The basic idea behind the guide is that we introduce a parameterized distribution qϕ(z), where ϕ are known as the variational parameters. The guide will serve as an approximation to the posterior where we minimize the difference using the ELBO, evidence lower bound.

With our guide and model in place we can now approximate the posterior distribution using stochastic variational inference.

optim = Adam({"lr": 1e-2})
svi = SVI(model, guide, optim, loss=Trace_ELBO(),num_samples=1000)
num_iterations=1500
def train():
pyro.clear_param_store()
for j in range(num_iterations):
for xb,yb in train_dl:
# calculate the loss and take a gradient step
loss = svi.step(xb, yb.squeeze(-1))
if j % 50== 0:
print("[iteration %04d] loss: %.4f" % (j + 1, loss / len(xb)))

Since our variational distribution is fully parameterized, we can just run the lifted model forward. Making use of some helper functions provided by Pyro we make predictions with the following:

get_marginal = lambda traces, sites:EmpiricalMarginal(traces, sites)._get_samples_and_weights()[0].detach().cpu().numpy()def summary(traces, sites):marginal = get_marginal(traces, sites)
site_stats = {}
for i in range(marginal.shape[1]):
site_name = sites[i]
marginal_site = pd.DataFrame(marginal[:, i]).transpose()
describe = partial(pd.Series.describe, percentiles=[.05, 0.25, 0.5, 0.75, 0.95])
site_stats[site_name] = marginal_site.apply(describe, axis=1) \[["mean", "std", "5%", "25%", "50%", "75%", "95%"]]
return site_statsdef wrapped_model(x_data, y_data):
pyro.sample("prediction", Delta(model(x_data, y_data)))
#posterior = svi.run(X, Y)
x_data=x[-1000:]
y_data=y[-1000:]
posterior = svi.run(x_data, y_data)
trace_pred = TracePredictive(wrapped_model,posterior,num_samples=1000)post_pred = trace_pred.run(x_data, None)
post_summary = summary(post_pred, sites= ['prediction', 'obs'])
mu = post_summary["prediction"]
y_o = post_summary["obs"]
predictions = pd.DataFrame({"bg": x_data[:, 0],"mu_mean": mu["mean"],"mu_perc_5": mu["5%"],"mu_perc_95": mu["95%"],"y_mean": y_o["mean"],"y_perc_5": y_o["5%"],"y_perc_95": y_o["95%"],"true_y": y_data,})
Fig 2 Output of Predictive Distribution

As displayed in Fig 2, we now have bounds around our predictions. Keep in mind we are not as concerned with a point estimate(mean of distribution)of the future blood sugar as we are in using the uncertainty to feel more comfortable in making diabetes management decisions. For example, realizations of bgs being outside the predicted bounds can be a very useful piece of information to know something might be wrong. Much of the time when I see the bg realization(around mealtime hours) outside of these bounds its due to an incorrect or mistimed bolus — which is no big deal and quickly comes back in line. However, when you consistently are seeing realizations outside of prediction intervals you get insight into a significant adjustment might be required in current treatment methods. This is what I experienced last week when consistent realizations outside the bounds led us to the conclusion insulin sensitivities needed to change.

Table 1. Sample of Output

This model has helped give me piece of mind in my daily life and I think it is just scratching the surface of the possible. I can now feel more confident in the decisions I am making and knowing I don’t always have to be on high alert. Most of the time just having piece of mind knowing you are doing a good job and you can live a normal life can be enough. There is no such thing as perfection when it comes to diabetes so why expect the same of yourself.

We are excited to share our progress as we continue to make improvements in our solutions to help people living with diabetes. We will be sharing our work at the upcoming Diabetes Technology Conference in Washington, DC November 14 and at ATTD in Madrid, Spain in February. Would love to connect with people with similar interest in this space.

--

--

Dan Korelitz
The Startup

Passionate about using data and technology to improve health outcomes. Interested in new projects and collaborations.