Saving Lives With Data and Math

By Jimmy Fryers

University students at the PIMS BC-Data workshop identify certain proteins that are highly related to sepsis fatality rates, prompting calls for further research.

The BC DATA Workshop

The week-long PIMS BC Data workshop had two goals: to bring together top students and industry professionals to tackle interesting research and industry problems, and to develop data science literacy in students with strong mathematical skills who may have little or no previous experience in the realm of “data science.”

For the workshop, Dr. Keith Walley from Vancouver’s St. Paul’s Hospital and The University of British Columbia, brought sepsis-related data for a group of eight undergraduate students to work with and according to Dr. Walley, “what they were able to do in just one week was actually quite amazing.”

What is Sepsis?

Sepsis is a serious medical condition usually caused by an overwhelming reaction to infection. It is the leading cause of death in intensive care units worldwide. On average, one of every 18 deaths in Canada occurs due to sepsis and septic shock.¹

Cytokines, SNPs and why they are important

Severe infections lead to septic shock, which causes very low blood pressure and, ultimately, organ failure.

Cytokines are a broad group of proteins and one of the body’s initial immune responses to severe infection is to release many cytokines, which are responsible for a decrease in blood pressure.²

Previous studies have shown correlations between various cytokine levels and patient death after 28 days in septic shock.³ But a correlation doesn’t prove causation and that’s where math comes in.

SNPs (Single nucleotide polymorphisms) are the most common type of genetic variation among people. Each SNP represents a difference in a single DNA building block and they can act as biological markers, helping scientists locate genes that are associated with a disease.⁴

Therefore, if an SNP is found to correlate with higher levels of a particular cytokine, it’s likely that the SNP causes increased levels of that cytokine. If statistical analysis links that SNP to death, that that demonstrates that the cytokines related to that SNP are possibly responsible for the fatality.

A causal relationship between cytokines and death can be inferred if there is a correlation between certain SNPs and particular cytokines, as well as a correlation between these SNPs and death.

According to one of the team members who attended the workshop, Mingfeng Qiu, these cytokines were the focus of the project, which had the goal of:

“Finding a possible connection between the death of patients from septic shock and certain inflammation-regulating proteins, called cytokines. Besides a possible correlation, Dr. Walley was also interested in uncovering causal relationships. The ultimate goal was to identify possible drug targets for treating sepsis.”

Working with the Data

The students were provided two sets of data:

  • Cytokine Levels from patients’ blood work
  • Genomic data (anonymized) for each patient

To handle these large datasets, which included 1.2 million SNP measurements from 330 patients, they used the online Syzygy platform, which brings Jupyter notebooks to researchers across Canada. This enabled them to store the data, run all the code required to analyze it, and generate the results. The advantages of this approach included:

  1. An internet interface with no local installation is necessary, saving lots of effort.
  2. All software is of the same version and with latest functionalities, making teamwork easy.
  3. It provided a centralized, safe, and easy-to-sync storage of data.

Using statistical analysis coupled with machine learning, the team was able to determine a list of SNPs that were related to the key cytokines that correlated with death.

After 14 cytokines were identified, the team perform further analysis of the cytokine levels and their association to death. As can be seen from the graph below, the cytokines were narrowed down to 2 specific cytokines, namely IL1B and MIP1A, which are more correlated to mortality outcomes.

The death coefficients for each cytokine


The significant outcomes of the one-week workshop were very promising. According to Dr. Walley:

“The students found specific molecular pathways that conceivably could actually contribute to survival or death from sepsis. And if all of that turns out to be true, then the goal would be to potentially develop a drug that could be used in sepsis.

Part of the real value of what the students did is the first step — the first domino that falls: a plausible discovery. And that is the thing that gets my colleagues around the world excited and the thing that motivates them to share their data sets.”

What’s Next?

There are potential real-world implications for this study. The results suggest a relationship between certain cytokine levels and the mortality of septic shock that could lay down a foundation for effective medication.

The next step of the process has already begun. Dr. Walley has formed a team, which includes a student who worked on the project during the BC Data Workshop, and is amassing large volumes of sepsis data from around the world.

The first dataset Dr. Walley has obtained is the iSPAAR dataset, led Mark Wurfel from the University of Washington. But there are also other potential collaborators in the USA, the U.K. and Germany.

Participation from two or three more institutes would represent the biggest sepsis data collection in the world by far, and give them the best leverage to make the discoveries they really want to make to get treatment to patients in the ICU.

The initial work done in the workshop was association data, i.e. cytokines and genetic variance are associated with survival outcome, but there are ways of establishing whether it’s likely causal, as Dr. Walley explained: “I think we have all the data we need to take a Mendelian randomization approach to determine if what the students discovered seems to causally contribute to outcome survival from sepsis. That then points us at a drug discovery target.”

Regardless of the outcome, the sharing and utilization of large datasets to make important medical discoveries is a sign of what’s to come in the health sciences.

“Even if this first thing that the students found doesn’t turn out to be causal — doesn’t turn out to be a pathway that we pursue, this greater global collaboration that they’ve helped to initiate is almost certainly going to lead to exciting new ways to treat severe sepsis,” said Dr. Walley.


[1] Statistics Canada, Health at a Glance — Deaths involving sepsis in Canada

[2] Dr. Keith Walley MD, Septic Shock Data Discovery Dataset

[3] Rautanen, A., et al. “Genome-wide association study of survival from sepsis due to pneumonia: an observational cohort study”. In: The Lancet Respiratory Medicine 3.1 (2015), pp. 53–60.

[4] Encyclopedia Britannica, Single nucleotide polymorphism

[5] Bowen, La , Liu, Nasouri, Nguyen, Nip, Qiu, and Triandafilidi, A Deep Look into Cytokines and Septic Shock, BC Data Workshop, St. Paul’s Hospital Final Project

Other reading:



Pacific Institute for the Mathematical Sciences
The Pacific Institute for the Mathematical Sciences

PIMS — A consortium of 10 universities promoting research in and application of the mathematical sciences of the highest international calibre.