On a path to better understand preterm birth and pregnancy complications

Preterm birth is the leading cause of newborn death in the U.S.

11.4% of all American babies are born prematurely causing huge emotional distress to families.

Preterm birth is a complex problem involving multiple interacting etiologic mechanisms. As of today, no clear physiological, genetic, behavioral or environmental pathways have been identified.

As many as 50% of preterm births have unexplained causes.

How can we better understand what is causing preterm birth and help identifying mothers at risk?

At Bloom Technologies, we are taking a data-driven approach.

We believe that by combining multiple sources of information, from anthropometrics to socio-economics, longitudinal behavioral and physiological data, we will be able to better understand preterm birth and identify mothers at risk at an early stage.

Our first step was to use open data to provide easy to use and scientifically sound tools to expecting mothers and clinicians.

Open Data

The open data movement is spreading. The U.S. government is making great efforts to provide population health data freely accessible through different initiatives. The Division of Vital Statistics of the National Center for Health Statistics (NCHS) provides data on all births in the U.S., for everyone to download, here.

The dataset is probably among the biggest ever collected on prenatal care. Over 100 parameters and 4 million deliveries per year.

At Bloom, we used the Vital Statistic data to model the relation between anthropometrics, socio-economics, risk factors and preterm birth. By properly modeling the relations between these parameters we can better understand how they influence risk of preterm birth.

Histogram of gestational age at delivery for all deliveries recorded in the U.S. in 2013. Preterm birth, highlighted in red, is defined as gestational age at delivery shorter than 37 weeks

For example, we can look at how variables representative of socio-economic status, such as principal source of payment for the delivery and education level, relate to preterm birth:

Relation between principal source of payment (private insurance or medicaid, about 50% each on the entire dataset) and level of education (selected categories). Both parameters can be used as proxies to socio economic status, showing how higher percentages of mothers deliver preterm in lower socio-economics categories. For example, the percentage of preterm births for privately insured women holding a bachelor degree is 9%. However, the percentage gets as high as 14% for lower socio economics categories. Bars widths indicates the number of women in each category.

And then look at how other parameters, such as ethnicity, impact preterm birth independently of confounding effects due to socio-economics, anthropometrics or other risk factors:

Posterior probability for different ethnicities (log odds ratios) with respect to the baseline category, white non-hispanic women (i.e. the category with the most entries). While Mexican ethnicity shows similar risk (slightly higher) with respect to white non-hispanic women, non-hispanic black, shown in red, show higher risk even when corrected by all other variables included in the model (odds ratio 1.65).

Since the dataset provides coarse information on gestational age at delivery (full weeks of gestation), we can also perform survival analysis and determine the probability of delivery at any given week, depending on different maternal parameters.

For example here we can see how the probability of delivery after a certain week decreases much faster for someone of older age expecting twins, with respect to a first time mom of a single child in her 20s:

Survival analysis for two mothers with different characteristics. The probability of delivering at term (after 37 weeks) reduces drastically for older women expecting twins (red line).

Based on this data, we investigated how each parameter impacts the individual risk of preterm birth, and implemented our model in a simple app, 37.

What is your risk of preterm birth?

37 answers this simple question.

It uses twenty parameters to feed a Bayesian model and determine preterm birth risk. The app is free and can be used by expecting mothers as well as by clinicians.

For expecting mothers, the app provides awareness and perspective with respect to people with similar characteristics.

For the clinician, the app can be used to dynamically explore the dataset, therefore understanding the impact of different parameters without the need for downloading the dataset and further statistical analysis.

The 37 app.

The app shows how a woman’s risk compares to the risk in the general population in the U.S.

By looking at the risk breakdown, you can see how different parameters impact overall risk. For each parameter, the app shows what the risk would be in case that parameter was different, holding all other parameters constant.

From risk to prediction

Simply put, the model’s parameters stratify the overall population to identify the risk of preterm birth for people similar to you, based on historical data from all births certificates in the U.S.

The model we built is a generalized linear model including many non-nested groupings and a logit link function to determine the probability of preterm birth given different parameters. The model is implemented in the 37 app and provides a risk of preterm birth.

The area under the ROC curve (a metric typically used to determine performance of binary classification problems) is 0.69. Adding a few parameters that can be acquired during pregnancy, such as smoking and weight gain, brings the value to o.73.

However, we want to take it a step further. Understanding individual risk is important, but ultimately we want to be able to predict in advance who is going to be preterm. To this aim we built another model predicting the actual gestational age at delivery.

The positive predictive value (PPV, i.e. how many of the detected preterm delivery are actually preterm?) is 0.53. However, over 90% of the pregnancies detected by our model as preterm, have a gestational age at delivery of maximum 38 weeks, therefore resulting in a PPV of approximately 0.90 if we consider preterm and “early term” deliveries.

Finally, the sensitivity of the model is only 0.27. This means that roughly 3 out of 10 preterm deliveries are correctly detected.

The other 7 are missed.

How can we close the gap?

The 37 app is our first step. While the model provides early insights on the relation between anthropometrics, socio-economics, risk factors and preterm birth, many variables are still missing.

Preterm birth is a complex problem involving multiple interacting mechanisms. Apart from anthropometrics data, socio-economics and risk factors, previous research showed links between preterm birth and a multitude of other parameters, from serum biomarkers to behavioral and physiological data such as physical activity, stress and uterine activity:

At Bloom, we are currently working on different aspects, to improve preterm birth prediction and provide early detection of who’s at risk the most:

  • We are collaborating with the world’s best scientists and doctors in the field, combining our efforts and expertise to tackle the preterm birth epidemic.
  • We are developing a unique wearable sensor able to monitor longitudinally multiple physiological parameters, such as uterine activity, heart rate, heart rate variability, physiological stress and activity levels.
  • Finally, we are exploring different ways to extend our 37 app and make it a research tool, for example integrating it with the recently launched Research Kit.

Answers start with data. Data starts with you.

Will you join us on our mission?