Making Better Decisions with Offline Deep Reinforcement Learning

Introduction:

In the age of big data, we have massive amounts of data available at our disposal and it would nice if we could use some of that to make better decisions in the future. For example in healthcare we are interested in data-drive decision making by developing decision-support systems for clinicians. Such systems take EHRs and real-time data about patient’s physiological information and can recommend treatment (medication types and dosages) and insights to clinicians. The goal here is improve the overall careflow and eventual mortality outcome if the disease is severe.

The reinforcement learning (RL) provides a good framework to…


The goal of exploratory data analysis is to get you thinking about your data and reasoning about your question. i.e to make sure we have the right data, any problems with the dataset, determining if what we answer our desired question and get rough idea of what the answer will look like.

In the bigger picture Data-driven science, we start by collecting a data set of resonable size, and then looking for patterns that ideally will play the role of hypotheses for future analysis[2]. Exploratory data analysis is the search for patterns and trends in a given data set.

After…


Online decision-making involves a fundamental choice:

Exploitation: Make the best decision given current information

Exploration: Gather more information The best long-term strategy may involve short-term sacrifices Gather enough information to make the best overall decisions

Two potential definitions of exploration problem:

  • How can an agent discover high-reward strategies that require a temporally extended sequence of complex behaviours that, individually, are not rewarding?
  • How can an agent decide whether to attempt new behaviours (to discover ones with higher reward) or continue to do the best thing it knows so far? (from keynote ERL Workshop @ ICML 2019)

Techniques:

There are a…


Reinforcement Learning is about learning to make good decisions under uncertainty. Its based on the reward Hypothesis which says

That all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward).

This can be seen as an Optimization problem where our aim is to cumulate as much reward as possible (while learning how the world works) i.e goal is to find an optimal way to make decisions

Mathematically goal is to maximize expected sum of discounted rewardsand Everything happens in a…


Data availability still remains a challenge especially in domains like Generally speaking Patient confidentiality presents a barrier to the sharing and analysis of such data, however, meaning that only small, fragmented and sequestered datasets are available for research.

  • Today its possible, to generate statistically identical (image) data using a number of tools and techniques e.g Transformer models https://www.physionet.org/content/transformer-synthetic-note/1.0.0/
  • Moreover,trend of sophisticated Synthetic Data Engines like MDClone [1] being available suggests that this will become norm. provide a a good alternative to generate synthetic datasets statistically identical to original protected health information, but without privacy concerns,
Figure 1: Key Challenges include exponential data growth, Data Silos and privacy making analysis expensive and time consuming (source: taken from [1])

Methodology:

Following papers offer guidelines…


In a healthcare context the trajectory of a patient from arrival in the emergency ward to the admission to a hospital ward and up to the discharge can be seen as a process (sometimes also known as careflows). Often the execution of such a processes is supported by information systems. For example, the hospital may record medical information such as symptoms, the condition upon arrival of the patient, and the results of blood tests. Moreover also logistical information are recorded such as the movement of patients between wards and different types of discharge.

Process mining has been used by healthcare…


Surprisingly, despite AI’s breadth of impact, the types of AI technologies currently being deployed are still limited in terms of . Almost all of AI’s recent progress is through one type, in which some input data (A) is used to quickly generate some simple response (B)[2]. For example:

https://hbr.org/2016/11/what-artificial-intelligence-can-and-cant-do-right-now

Being able to input A and output B will transform many industries. The technical term for building this A→B software is supervised learning. These A→B systems have been improving rapidly, and the best ones today are built with a technology called deep learning or deep neural networks, which were loosely inspired by…


Process Mining deals with the problem of businesses process discovery, business process monitoring and identifying bottlenecks that allow organizations to streamline their existing processes. It uses process mining algorithms which leverage event logs generated by the organizations information systems. These algorithms and techniques allow us to go beyond the traditional analysis approaches of management and simulation. Similar to machine learning and data mining these approaches allow us to mine log data in similar fashion allowing organizations to correlate event data recorded in event logs and process models designed by business analysts.

Process Mining helps organizations to recognize and understand ground…


What is a Business Process?

A business process is a “collection of related and structured activities or tasks” that focus on providing a certain service or achieving certain goals in the context of a business environment. Business Process modelling is a process of representing information flow, decision logic and business activities occurring in a typical business process. It can also be seen as order-ing of work activities across time and place with clearly defined input and outputs representing a structure for a set of actions occurring in a predefined sequence.

Process models Representation:

Process models are widely used in organizations to document internal procedures that…


Monte Carlo Tree Search (MCTS) is a technique that provides an effective way to search game trees that have a high branching factor. Instead of performing exhaustive search, MCTS relies on random simulations that construct the tree while being guided by statistical physics. This technique has gained popularity recently due to its use and success in beating the World Go Champion in the ancient game of Go. Go has a high branching factor, requires a high compute budget, thus making traditional game tree search techniques ineffective.

MCTS is effective for such computationally intractable problems. It considers the current state as…

Asjad K.

sharing thoughts on AI, tech. and startups twitter:@asjad_99

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store