Maximizing Microgrid Efficiency through Q-Learning

Published in

Grid Solutions

4 min readJan 18, 2023

Schedule optimization is a crucial problem in Balancing as it helps to determine the optimal schedule for generating and consuming electricity to minimize costs, maximize the use of renewable energy sources, and ensure the stability and reliability of the Microgrid. It allows the system to make optimal use of the different resources available, such as the renewable energy sources and the energy storage system and to respond to changes in the electricity demand and the availability of renewable energy sources.

There are many algorithms that can be used to solve this problem. Some of them are

Linear Programming
Dynamic Programming
Meta Herustics
Reinforcement Learning
Hybrid Algorithms

In this post we are covering reinforcement learning using Q learning approach. Q-learning is a popular reinforcement learning algorithm that can be used to optimize the scheduling of electricity . The goal of the algorithm is to learn to create schedule that maximizes the reward. The system starts with an initial estimate of the Q-values (expected reward for a given action in a given state) and updates these estimates as the system learns from experience.

Before implementing Q-learning, there are several key concepts and principles that we should understand:

Reinforcement Learning: Q-learning is a type of reinforcement learning algorithm, so it’s important to have a basic understanding of the concepts of reinforcement learning, such as the distinction between states, actions, and rewards.
Markov Decision Process (MDP): Q-learning is based on the idea of a Markov Decision Process, which is a mathematical framework for modeling decision-making in a system. Understanding the principles of MDPs is important for understanding how Q-learning works.
Q-table: The Q-table is the primary data structure used in Q-learning. It stores the expected rewards for taking a particular action in a given state. Understanding how the Q-table is used and updated is crucial for implementing Q-learning.
Exploration vs Exploitation: One of the key challenges in Q-learning is balancing the need to explore different actions in order to learn about the system, with the need to exploit the knowledge already acquired in order to achieve a good performance.
Hyperparameters: Q-learning has several hyperparameters such as learning rate, discount factor, and exploration rate that need to be set before training. It’s essential to understand the effect of these hyperparameters on the learning process.
Reward function: Reward function is an important part of Q-learning, It defines what is the goal of the agent, and how the agent is going to get closer to the goal.

The steps we need to do before starting optimization process are forecasting the solar and wind energy generation, forecasting the load and prices from the electricity market, and taking dischargeable battery capacity. The optimization process will determine when to charge or discharge the battery and when to buy or sell electricity from the grid. The important things to keep in mind is we need to have some environment, either simulated or actual to implement the reinforcement learning. You can use pymgrid to create an environment. However in this blog we are not going to cover pymgrid.

import numpy as np

states = [(soc, t) for soc in range(100) for t in range(48)]

actions = ['charge', 'discharge', 'buy', 'sell']

q_table = np.random.rand(len(states), len(actions))

# define the learning rate and discount factor
alpha = 0.1
gamma = 0.9

# loop through each time step in the schedule
for t in range(48):
    state = (get_soc(), t)
    available_actions = actions
    
    # select the action with the highest Q-value
    action = np.argmax(q_table[states.index(state), :])
    
    # update the Q-value based on the reward received
    reward = get_reward(state, action)
    next_state = (get_soc(action), t+1)
    q_table[states.index(state), action] = (1 - alpha) * q_table[states.index(state), action] + alpha * (reward + gamma * max(q_table[states.index(next_state), :]))

The above code snippet shows generic implementation of Q-learning. The code defines the states as a combination of battery capacity and time of day, stored in the states variable. These actions represent the different options the agent has to schedule the battery and power exchange with the grid. The learning rate, alpha, controls the amount that the Q-value is updated in response to new information. A high learning rate will cause the Q-value to change more quickly in response to new information, while a low learning rate will cause the Q-value to change more slowly.

def get_reward(state, action):
    price = get_price(state)
    soc = get_soc()
    
    # get the amount of solar being used
    re = get_solar()
    
    if action == 'charge':
        reward = X
    elif action == 'discharge':
        reward = y
    elif action == 'do nothing':
        reward = Z
    reward += re*soc
    
    #X Y Z are the reward value you want to assign according to the decision made by model
    return reward

The above code shows how we can implement logic to calculate reward. If action chooses by model is as expected we can assign positive reward otherwise negative.

In this way we can implement Reinforcement learning for schedule optimization. If you want to consider Profit optimization using Reinforcement Learning Deep Reinforcement Learning will be more applicable as it will be able to capture the Pattern. As environment changes, same model can be used, however we will see some imbalance and after more training, errors will be resolved.

Maximizing Microgrid Efficiency through Q-Learning

Written by Sapkota Bikash