Multi-Armed Bandits: Epsilon-Greedy Algorithm with Python Code

Learn how Epsilon-Greedy works. Full python code provided for all experiments.

Artemis N.
Analytics Vidhya

--

Photo by Obi Onyeador on Unsplash

It’s Saturday night and you are getting ready for a movie night. But first, you want to order food! There’s a selection of restaurants in your area and you’ve only tried a few. The question therefore arises, should you order your favourite dish from the ones you’ve tried so far (exploitation) or should you try something new (exploration)?

This dilemma between exploration and exploitation is what multi-armed bandits algorithms try to solve. The aim is to maximise the reward from a sequence of actions (in the restaurant case we want to maximise satisfaction from ordering food) over the long run. One such algorithm is the Epsilon-Greedy Algorithm.

The Algorithm

The idea behind it is pretty simple. You want to exploit your best option most of the time but you also want to explore a bit in case there’s something even better.

In our example, every Saturday you would randomly sample a number between 0 and 1. If that number is less than 𝛆 then you pick a restaurant at random, if it is not you pick your favourite restaurant so far. Therefore, 𝛆 defines how much time you will spend exploring new stuff.

--

--

No responses yet