Stretched Exponential Decay function for Epsilon Greedy Algorithm

Published in

Analytics Vidhya

4 min readMay 3, 2020

While working on a Reinforcement Learning (RL) project, i was looking for a decay function that can provide the RL agent following characteristics

More dwell time for exploration at initial part of episodes
More exploitation with random exploration at end of episodes (quasi-deterministic policy)
Smooth gradient while switching from exploration to exploitation

While there were several resources in web, i was not able to find a close match to the function that i was looking for. So i ended up concocting a decay function on my own. I also learnt that this sort of decay function is called Stretched Exponential Decay

Expression for Stretched Exponential Decay

In python the code would look like:

A=0.5
B=0.1
C=0.1
def epsilon(time):
    standardized_time=(time-A*EPISODES)/(B*EPISODES)
    cosh=np.cosh(math.exp(-standardized_time))
    epsilon=1.1-(1/cosh+(time*C/EPISODES))
    return epsilon

Here the EPISODES are the number of iterations for which we will be training the RL agent. Also there are 3 hyper parameters: A, B and C. We will look in to this in a moment

For the hyper parameter setting above, the decay function will look like this:

The left tail of the graph has Epsilon values above 1, which when combined with Epsilon Greedy Algorithm, will force the agent to explore more

The right tail of the graph has Epsilon values close to zero, which helps the agent to exhibit Quasi-deterministic behavior. This means the agent will be exploiting more at later part of episodes but randomly it can explore as well. Imagine when deploying an RL agent to play again human opponents, the agent’s move can be always guessed, if the agent were to chose the same best action always. So this decay function can be deployed for those situations as well.

There is a transition portion in between left and right tail of the graph, that smoothly transitions agent behavior from exploration to exploitation

The code to check shape of decay function

new_time=list(range(0,EPISODES))
y=[epsilon(time) for time in new_time]
plt.plot(new_time,y)
plt.ylabel('Epsilon')
plt.title('Stretched Exponential Decay Function')

Hyperparameters for decay function

The parameter A decides where we would like the agent to spend more time, either on Exploration or on Exploitation. For values of A below 0.5, agent would be spending less time exploring and more time exploiting. For values of A above 0.5, you can expect the agent to explore more

Decay function for A=0.3, the left tail of the graph has shortened so the agent will be relatively exploring for lesser duration

Decay function for A=0.7, the left tail of the graph has lengthened so the agent will be exploring for longer duration of time

The parameter B decides the slope of transition region between Exploration to Exploitation zone

With B as 0.3, the slope becomes close to 45 degree. Personally, i opt for B=0.1

With B=0.3, the transition portion has gradient = -45 degree

Parameter C controls the steepness of left and right tail of the graph. Higher the value of C, more steep are the left and right tail of the graph. Here as well, i prefer to use C=0.1

With C=0.7, the left and right tail tends to get steeper

Deployment of decay function in Epsilon Greedy Algorithm

The code for Epsilon greedy algorithm will be as follows

def epsilon_greedy(state, time):
    
    z = np.random.random() #provides a number less than 1
    state=Q_state(state)   #state is provided by environment  
    if z > epsilon(time):
    #for smaller epsilon values, the agent is forced to chose the best possible action        action = <write your code to choose best possible action> 
#Exploitation: this gets the action corresponding to max q-value of current state
        
    else:
# for larger epsilon values (close to 1), the agent is forced to explore by choosing a random action        action = <write your code to choose the a random action>    #Exploration: randomly choosing and action
    
    return action

Stretched Exponential Decay function for Epsilon Greedy Algorithm

Expression for Stretched Exponential Decay

Hyperparameters for decay function

Deployment of decay function in Epsilon Greedy Algorithm

Written by subbiah natarajan