Reinforcement learning applied to business problems

William Infante
Nerd For Tech
Published in
5 min readMay 23, 2021

Key takeaways from the SageMaker Talk in the 2021 AWS Summit Online Australia and New Zealand

Photo by Alina on Unsplash

Presented here are some of the key takeaways and my thoughts on the talk Using reinforcement learning to solve business problems that first appeared in the 2021 AWS Summit Online Australia and New Zealand. Overall, I found the 3 questions and 4 steps in deploying RL to production helpful for thinking about business problems. These concepts are also not just limited to the AWS SageMaker service.

Intro

Even at the start, the speaker mentioned two things that I couldn’t agree more:

  1. Businesses recognize the potential value of machine learning or reinforcement learning to grow but translating the machine learning concept into production is another story.
  2. The biggest obstacle in RL lies in the problem formulation and the next steps would be shaped by that formulated problem.

Oh, and there’s also a third item. There are definitely other RL applications, but I am not complaining about the advancements through RL in games.

3. Applications for RL are not just in games.

I mean, who wouldn’t be amazed at how Alpha Go through RL defeated the then reigning Go world champion given the many configurations and possibilities in the game. Okay, digressing here.

The speaker did mention that reinforcement learning is a subset of machine learning. At least that’s a cue for my I-hope-the-speaker-emphasized thought. For me, likely influenced by my colleagues too, although machine learning (and more specifically reinforcement learning) does have its own merits, not all business problems should be formulated as machine learning problems or the subset reinforcement learning problems. Having this delineation on the outset will (a) free up resources and people expertise that could be used for ML/RL problems with more impact and (b) prevent expectation problems when other conventional dev work would do the job given the time, money, and scope. I still see the large impact of ML/RL problems, but business problems shouldn’t be too “forced” to fit an ML/RL problem, and if it does fit an ML/RL problem, utmost care should be given in the formulation stage.

Into the main topic, I did like how the author was giving the 3 questions and 4 steps in deploying RL based on his experience, and we’ll go through those in the next sections. Even if you’re using say Google Cloud Datalab, the concepts apply too. Listing them down here quickly.

Three questions:

  1. (State) What info do I have?
  2. (Actions) How many decisions are needed?
  3. (Measure) How is my algo doing?

Four steps:

  1. Problem Formulation
  2. Build a training and evaluation system
  3. Choosing an RL algorithm
  4. A/B Testing

Three Questions

It was nice to see these three questions consistently used in examples such as district heating, dynamic pricing, and inventory replenishment.

At the start, I really wanted to see how I could use this 3-question concept and I even noted a simple table of the State, Action, and Measure from the use cases. The examples were carefully handpicked to provide a bit of breadth to the different RL potential problems (at the less than 30-min talk).

And did they just use that S-A-M cue for a subtle promotion of the AWS-associated SAM (like the SAM CLI)? It wasn’t just me who noticed that, right?

+---------+-----------------+----------------+------------------+
| | District | Dynamic | Inventory |
| | Heating | Pricing | Replenishment |
+---------+-----------------+----------------+------------------+
| State | Historic | Freight price, | Historic demand, |
| | weather, | week of the | inventory, |
| | current circuit | year for the | items ordered |
| | temperature | season | |
+---------+-----------------+----------------+------------------+
| Action | Return water | Daily ticket | Safety coverage |
| | setpoint | price | weeks |
+---------+-----------------+----------------+------------------+
| Measure | Diff between | Daily revenue | Average |
| | 22C and | | coverage and |
| | room temp | | margin over |
| | | | lifetime |
+---------+-----------------+----------------+------------------+

That S-A-M questioning is also noticeable in the code walkthrough. I just hoped that they also provide direct links to some code repository of what they presented. Since some of the examples are real-world examples, some data may not be available, but they could include dummy data with the code repo. Overall still on the plus side as some talks do not even provide some code walkthrough.

There was one use case where the author said they compared the RL performance with other traditional ML solutions. I think that approach (if resources are not a problem) is a good practice. Like what we said earlier, not all business problems need RL. Some of them could be other forms of machine learning problems or not even machine learning problems.

At the end of the use cases, the speaker also recognized that at the start of the RL journey, developers and data scientists may not grasp or have that intuition on how to pick an algorithm and I was glad that he presented an overview. If I was only allowed to take one screenshot from the whole talk, it will probably be the image below:

Algorithm Choice from the AWS SageMaker Talk

In time from experience, the developer or data scientist can then probably choose an algorithm not necessarily following the image. And the world of RL is still evolving/learning and quite a few algorithms are still to be implemented that may beat the current ones we have.

Four Steps

We’ve covered most of the takeaways from the 1. Problem Formulation and 3. Choosing an RL algorithm mainly discussed in the Three Questions section. In this part, I’d like to focus on 2. Building a Training and Evaluation System and 4. A/B Testing.

When building the system, I liked how the author emphasized if we want to optimize for the long-term horizon or choose an immediate reward. Those approaches will lead to different outcomes/systems. If we only optimize for the next immediate reward, we can probably have simpler algorithms without transition functions. The author also had a use case earlier in the talk that skipped the transition function.

The last part involves A/B Testing. I usually just see this testing on email marketing campaigns or Web site changes but seeing this in RL (probably ML in general) deployments also makes sense. This could help modify the reward system used in RL. In addition, applying it with a smaller population reduces the risk and we can also measure the RL’s performance against an it-really-matters baseline.

Parting Notes

The talk was overall informative talk and makes me want to check out other related talks. For example, there’s an A/B testing machine learning models with Amazon SageMaker MLOps talk also in this Summit.

--

--