How to Enable Public Health by Innovation in Predictive Analytics

Egen
Egen Engineering & Beyond
11 min readDec 17, 2020

--

There’s a popular saying that goes along the lines of

Prevention is better than cure.

Makes sense doesn’t it? it’s easier to stop something happening in the first place than to repair the damage after it has happened.

When it comes to pandemics however, this is easier said than done. After-all in-order to really prevent something, you’ll need to predict its occurrence well in advance right?

Pandemics have been around ever since humans started to walk around on the Earth, right from the hunter-gatherer days. While some of them have affected large sections of the human population, others have managed to collapse entire civilizations and altered our very history. Who knows how many lives could have been saved if there was a way to predict their occurrence and spread months before they had a chance to wreak havoc?

The presently ongoing COVID-19 pandemic hit us in the year 2020, a time during which we’ve already made several advances in the field of computing and data mining.

Thanks to advances in cloud computing and machine learning, individuals and organizations can plan their travel and business policies ahead of time, based on disease exposure risk factors.

This directly translates into increased safety and well-being of individuals, minimizes travel costs, and prevents needless anguish.

At Egen, we used polynomial regression techniques in Amazon Sagemaker to build a Covid Recommendation System, or CO-RS in short.

CO-RS is a machine learning powered app that can accurately predict COVID-19 exposure risk factors and offer meaningful advice to the user on whether it’s the right time to travel to a certain place.

CO-RS App - Mobile App Screens

Why Machine Learning ?

According to Gartner, AI Augmentation Will Create $2.9 Trillion of Business Value in 2021. Machine learning and AI’s growing acceptance in business across industries reflects how effective its techniques are at solving complex problems quickly.

In traditional programming, a programmer writes step by step instructions describing how to solve a problem, whereas ML empowers business to learn from data and improvise with data driven decisions to solve challenges while optimizing for a given objective.

SOURCE: STATISTA, MACHINE LEARNING TOPS AI DOLLARS, MAY 10, 2019

ML helps in efficiently optimizing businesses

“Data” is the new potential, and companies around the globe are trying to leverage this potential in finding a solution to their enhanced business models and customer needs in the midst of COVID-19 crisis.

The ongoing development in AI and ML has significantly improved treatment, medication, screening, prediction, forecasting, contact tracing, and drug/vaccine development process for the Covid-19 pandemic. — Source

Smarter, faster decision — making

Source: unsplash.com by Chris Montgomery

In the last few months we have seen a drastic change in the lifestyle of people, community, and government policies.

“The crisis has forced every company into a massive experiment in how to be more nimble, flexible, and fast.”

- Kate Smaje, senior partner, McKinsey & Company

The most affected domain due to this pandemic is the economic sector. The way companies are operating since the end of 2019 has changed, with that there is a great impact on the policies. Companies are making deals, meetings, conferences virtually. It is hard for them to find out if it is safe to travel during this time. This problem could be solved with ML for helping businesses take smarter, safe and faster decisions.

Adding new capabilities to existing products -

Fig: Search and Trips features in Co-RS App

Our app focuses on helping the companies to plan their travel policies. They can simply mention the locations they are planning to include in the travel plan, and the app can provide them a COVID19 exposure risk factor based on ML predictions.

If a company can evaluate the risk factor of visiting a place, it can help in coming up with a budget, possible health safety measures, safer hotel recommendations during the business trip.

As not all places would have similar risk factors, it would help companies a lot to manage budget and use it appropriately.

How we built CO-RS

In an ML problem the key factor holds in learning a function from data. There are guidelines that can help us in reducing the risk associated in handling a ML project. Below are few approaches taken

Identifying the right ML Approach -

1. Framing a product goal in an ML paradigm

Transition from a high level product goal to ML-powered application in production environment plays a crucial role. Most businesses have a clear understanding of their potential bottlenecks or pain areas for example, predicting which customers will buy their products, identifying empty parking slots, predicting housing prices etc. The bigger challenge arrives when they try to solve these pain areas with estimation, planning with ML and execution according to the available resources and budget.

Experience counts in the implementation of a successful ML project because several times the project becomes overly ambitious models and leads to missed deadlines. When building products, you should start from a concrete business problem, analyze whether it requires ML and then work on arriving at a prototype which is simple and iterative. For a single product goal, usually we have several ML algorithm execution ideas, with different levels of implementation difficulty.

2. Evaluating ML feasibility

Best ways to evaluate ML feasibility is to look at the available data and existing models that could leverage given data. The two core aspects of an ML problem: Data and models should be thoroughly examined and start with the ones we judge as simplest.

Building an initial prototype

Now we will understand the high level architecture of our CO-RS app w.r.t the Machine Learning life cycle phases implemented with AWS Services.

  1. Data Processing: We used JNU dataset as a source of truth for our covid-19 model, though it has not covered all the cities and states of the US it is still one of the best available. Then we aggregated and did cumulative addition grouped by every day on the dataset so as to get Number of cases on the location on the particular date. Then we filtered out the null values and cleaned the data with mean as filler.
Covid-19 JNUData Analysis

2. Model visualization: We used Seaborn to visualize the data and we observed that the date and number of cases follow a polynomial nature.

3. Model training: Over the certain observations we decided to keep polynomial regression with degree 3rd to train our model over the data.

But wait,

What is Polynomial Regression?

Polynomial regression is a special case of linear regression where we fit a polynomial equation on the data with a curvilinear relationship between the target variable and the independent variables.

Source: TDS: Introduction to Linear Regression and Polynomial Regression

General equation of polynomial regression is:

Y=θo + θ₁X + θ₂X² + … + θₘXᵐ + residual error

We can see clearly in the below plot that confirmed cases over time are correlated but the relationship is non-linear i.e. a straight line cannot fit through all data points. Hence polynomial regression is the best fit.

Below is the Amazon SageMaker Code for model training-

4. Deployment of Model: Once trained we stored the trained model in an S3 bucket. The lambda services explained above makes a call to the bucket, fetch the model and predict based on a given feature set.

5. Prediction By lambda: Our lambda service calls API to predict the population and then use the trained model to tell the COVID cases density in future on certain data : density = number of cases /population.

Integrating AWS Services

Why AWS?

AWS uses CRISP-DM as a baseline for building ML Workloads as it’s a proven tool in the industry and is application neutral, which makes it an easy-to-apply methodology that is applicable to a wide variety of ML pipelines and workloads.

Fig : End-to-End Machine Learning Process (Source — AWS Docs)

Here are the major advantages of implementing machine learning in AWS:

  • Reduce training time by 50%
  • Provide 90% scaling efficiency
  • Deliver 3x faster network throughput
  • Improve price and performance by 25%
  • 81% of deep learning projects in the cloud run on AWS
  • 85% of TensorFlow projects in the cloud run on AWS

Integration of AWS services in our CO-RS App

The entire application is divided into several smaller components. Each component serves a separate business purpose. For machine learning purposes we make use of Amazon sage maker, S3 storage and AWS lambda.

  1. We fetch publicly available Covid-19 Data from John Hopkins database.
  2. We use Amazon Sage maker to wrangle the data and train our model. Trained models are stored in Amazon S3 buckets.
  3. We have a prediction lambda that fetches the latest model from s3 bucket and predicts based on the location and date provided. Lambda is triggered using HTTP requests routed using an API Gateway. Lambda function is written in Python 3.6 and we make use of custom layers that can be attached to lambdas to keep the build package size to a minimum. The heavy dependency packages like Pandas and NumPy exist in the custom layer that we create.

Machine Learning with Amazon SageMaker:

Amazon SageMaker is a fully managed ML service used by data scientists and developers to quickly build and train machine learning models and deploy them into a production ready environment.

In our CO-RS app we used Amazon SageMaker to train the model with John Hopkins Covid-19 Data and deploy using model hosting services. SageMaker provides an HTTPS endpoint where our machine learning model is available to provide inferences. The following diagram shows how you train and deploy a model with Amazon SageMaker:

Source: AWS docs

Monitor and Update Models

The goal of monitoring is to track the health of the application. For ML Models performance and accuracy depends on the quality of the model predictions. In our CO-RS app, if a seasonal change in weather suddenly causes the model to produce subpar results, a good monitoring system will identify the uptrend and alarm us so that we can react as soon as possible. Let’s understand some key metrics which will help our app perform better -

Choose what to monitor:

Performance metrics : With a drift in the distribution of the data, the model can pose problems. Data drift is the change in model input data that leads to model degradation. In our case for instance if new locations are added in the input data, or there is a sudden change in relation between features, or covariate shift, the model should perform equally well. Below are the best practices taken to monitor performance metrics:

  1. Tracking changes in the input distribution
  2. Monitoring the input distribution
  3. Monitoring distribution shifts

SageMaker Model Monitor emits per-feature metrics to Amazon CloudWatch, which we can use to set up dashboards and alerts. The summary metrics from CloudWatch are also visible in Amazon SageMaker Studio, and of course all statistics, monitoring results and data collected can be viewed and further analyzed in a notebook.

Business metrics

The most important metrics are of course the product and business goals. They are the standard SLA’s against which we can judge our model’s performance. Prediction accuracy of CO-RS application is 85% for regions in the US. With proper dataset and research this model can be improved to the highest level.

How are we helping businesses prosper during Covid-19?

  • CO-RS app helps countries/business/ organizations/communities plan their response to COVID-19.
  • Our app focuses on helping the companies to plan business openings, organizing events, or business travel ensuring reduced health risks or greater employee safety.
  • Travel plans can be seamlessly planned by listing the locations they are planning to include in the travel plan and the app can provide them a COVID-19 risk exposure.
  • CO-RS enables public health and support in tackling the virus pandemic.

How a Covid-19 vaccine could positively impact business travel or the need of CO-RS App ?

Source: Corporate travel: Collaboration is essential for successful COVID recovery by pwc.com

(CNN) — It was the good news that gave the world hope.

On November 9 it was announced that one of the candidates for a Covid-19 vaccine, made by Pfizer and BioNTech, was over 90% effective in preventing volunteers from contracting the virus.

Palo Alto-based TripActions, a travel and expense management company for corporate travel, is optimistic about next year, with the recent news of vaccine development.

According to Meagen Eisenberg, the company’s chief marketing officer, business travel will recover with personal travel. As soon as a company loses a deal to a competitor because it didn’t have someone conducting a meeting in person or attending a dinner, companies will want to bring back business travel, she said.

Source — Crunchbase news

Above all , CO-RS App powered with ML is the start of many more ML powered solutions that will help businesses enhance their existing business models, attain greater profits during this COVID-19 Pandemic including enhanced employee safety.

Head over to our next article where we show you how to deploy this prediction model in the form of an app —

If you’re interested in learning how ML can help your business grow, give a shout to us on social media or shoot us an email!

References:

  1. Gartner Says AI Augmentation Will Create $2.9 Trillion of Business Value in 2021 — Gartner.com
  2. Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review — Sciencedirect.com
  3. Theobald, Oliver. Machine Learning For Absolute Beginners: A Plain English Introduction (Second Edition) (Machine Learning For Beginners Book 1) (p. 63). Scatterplot Press. Kindle Edition.
  4. Building Machine Learning Powered Applications by Emmanuel Ameisen
  5. Introduction to Polynomial Regression (with Python Implementation ) — Analyticsvidhya.com
  6. Introduction to Linear Regression and Polynomial Regression — towards data science
  7. Deploy a Model in Amazon SageMaker — Amazon Docs
  8. Pfizer says early analysis shows its Covid-19 vaccine is more than 90% effective — cnn.com
  9. Could The COVID-19 Vaccine Mean A Rebound For Travel Startups in 2021? — Crunchbase news

--

--

Egen
Egen Engineering & Beyond