Optimal Resource Utilization for Covid19 using RL

Published in

theremin.ai

5 min readApr 14, 2020

Optimal Resource Usage remains key to managing the Covid19 Pandemic

As the Covid19 pandemic progresses and continues to look like a long battle, optimally allocating limited resources will play a critical role in responding to and controlling it. Resources for testing people (test kits and laboratories), hospitals, quarantine facilities etc. are all in limited supply and a pro-active ability to manage their usage could yield massive benefits. Theremin.ai has developed systems to enable optimal resource utilization and in this article we outline our approach, especially for Covid19 testing in large populations.

Detecting infected people through testing has played a critical role for developing healthcare responses in the current pandemic. Countries have adopted varying levels of testing driven by their population size, availability of test kits, laboratory infrastructure and ability of the healthcare system to handle the volumes. South Korea has undertaken mass-testing, but most countries have focused on the cases showing symptoms and their direct contacts due to testing capacity constraints.

Given these testing constraints, our solution recommends how best to deploy available testing capacity in order to maximize chances of detecting infected people. Significantly, it prioritizes testing even for asymptomatic people in locations that would otherwise not be readily included in testing efforts. This will help early detection and containment of Covid19 spread and “flatten the curve” — giving valuable time to scale up responses to the pandemic and save lives.

Reinforcement Learning (RL) is well suited to analyzing the dynamics of disease spread in the population and providing recommendations on actions to optimize resource utilization. The Environment in RL can capture population profiles and interaction dynamics that drive Covid19 spread. An RL Agent then observes these dynamics and builds intelligence to recommend the most effective deployment of resources. Our Covid19 testing solution creates an intelligent Deep Q-Learning Agent (DQN) that provides recommendations on locations and number of people to be tested.

Reinforcement Learning and Deep Q-Learning

Reinforcement Learning is a sub field of machine learning that teaches an agent how to choose an action from its action space. It interacts with an environment, in order to maximize rewards over time. Agent learns in an interactive environment by trial and error using feedback (reward) from its own actions and experiences. Agent essentially tries different actions on the environment and learns from the feedback that it receives. The goal is to find a suitable action policy that would maximize the total cumulative reward of the agent.

In a DQN model, agent learns the action-value function Q(s, a): which optimal action to take in a particular situation. Agent tries random actions in the beginning (exploratory) to learn optimal action policy. An interesting concept in this model is discounted sum of rewards: agent gives lesser importance to the immediate rewards and strives to achieve long term goals.

Population Dynamics and Disease Spread (the Environment)

Our approach to modelling the population dynamics and associated disease spread aims to offer a simple yet extremely effective solution (see Figure 1). We segment any geographic area into clusters (i.e. compartmentalization). Within each cluster, we model the population as a graph where small units of population (e.g. family) are connected with their communities due to location proximity. The graph extends across clusters as well, due to inter-cluster movement of people (e.g. for work or social interactions). This graph structure enables prediction of disease spread when a particular person gets infected with Covid19. The scope of the geographic area in the graph can be scaled as needed — can be as small as a municipal ward or cover a district/county and can scale-up to a state or country level.

**Figure 1: Cluster Modelling of population Dynamics and Disease Spread**

Simulations in our cluster graph allow estimation of disease spread using the connections of an infected person. Importantly, we incorporate incubation periods for Covid19 and identify clusters that need to be tested for asymptomatic carriers. These clusters are potential candidates for pro-active testing.

Learning Optimal Responses (the Agent)

Our RL system builds an intelligent Agent that observes the population dynamics and progression of disease spread to learn the responses (i.e. actions) that will yield optimal resource utilization. Thousands of randomized simulations enable the Agent to explore scenarios and formulate appropriate responses. Our system simulates population size, density, movements as well as disease spread characteristics like R0 and incubation period. This range of simulations helps cover a sufficiently wide range of scenarios from which real-world responses can be developed.

Technically, our RL system uses a Deep Q-Learning Network to create a policy that assigns a success metric to each cluster. These success metrics can directly be interpreted for prioritization of resource allocation to a specific cluster. Thus, the success metrics will help identify the clusters in which upcoming testing efforts should focus and also indicate the quantum of testing to be undertaken. Such pro-active and targeted testing enables early detection of Covid19 cases and spread.

**Infection Simulation and Response Learning**

Simulation Results

We evaluated our system’s performance with specific scenarios. It performed well in all scenarios and importantly its recommendations align perfectly with what one would expect in the real world. Increasing the infection rate in a specific cluster under lockdown (very low inter-cluster movement) led the system to recommend increased testing within the cluster. But increasing infection rate in a cluster where movement to other clusters is high led our system to increase testing efforts in those other clusters — proactively looking to find asymptomatic people.

Real-world Application

Our system offers decision support to Government institutions and officials. Its recommendations can be combined with real-world urgencies/priorities (e.g. tests based on contact tracing). Together, this will help “flatten the curve” for Covid19 progression in a geography and save lives.

In our view, we successfully demonstrated the application of RL to solve for optimal Resource Utilization, especially in testing for Covid19 cases.

Theremin.ai focuses its efforts on applying ML and RL to financial markets problems. The Covid19 problems are urgent and critical and prompted us to apply our financial markets learnings to them. We hope our contributions lead to meaningful solutions during this pandemic crisis.