Using Robust Optimisation and Mixed Integer Programming to manage Singapore’s water resources — Part II

Published in

AI Practice and Data Engineering Practice, GovTech

9 min readFeb 28, 2020

Get a coffee️, bubble tea while we walk you through a toy example.

Data Scientists: Dr Loke Gar Goei, Chin Chii Yeh, Tan Kai Wei
PUB Engineers: Ashley Ng, Daniel John Tan, Hariharan Ramasamy
Visual Designer: Glenn Goh

If you have missed out Part I, click here.

Let us dive into how the project team coded the model as a mixed-integer linear programming (MILP) using the open-source puLp package! For this part, we assume that you have some degree of familiarity with the pandas package. For those who want an interactive session, feel free to visit this link using Google Chrome.

Note: Throughout this article, we will be using a mock demo. It is not representative of Singapore’s network and the constraints faced in actual operations. This demo is meant as an illustration of the process of model-building.

The diagram above describes the water loop. Raw water falls as rain and collects in the reservoirs, before being processed at treatment plants. The treated water is then stored in a service reservoir until it is consumed. Desalination plants supplement the treated water by converting sea-water into drinkable water. For this demo, we are using a deterministic model. This means that we know how much rain will fall and how much water will be consumed in advance (in Part III on Robust Optimisation, we will touch on non-deterministic models where we do not need to make this assumption).

Let us illustrate a simple network(refer to the following diagram) which shows how declarative-style programming is used to code an MILP using the puLp library.

In a diagram like this, we call the objects representing places (e.g the Reservoirs and the Treatment Plant) vertices (vertex, singular). The arrows represent pipes that link these places and we call them edges.

Again, remember the three parts that we will need to provide to the model:

Decisions: All the arrows above represent the amount of water moving through the pipes from one place to another (e.g. from Reservoir A to the Treatment Plant). PUB operators will need to decide the amount of water to pump on each pipe.
Objective function: There is a cost to treating water at the Treatment Plant. A possible objective is to minimise such costs. Suppose we incur a cost of c_i to process the water from vertex i (e.g. i could be Reservoir A or Reservoir B). Then the objective is to minimise the total cost of treating water

where c_i is the treatment cost per volume, *v_i* is the volume of water treated from vertex *i, and V is the set of all vertices.*

Constraints — There are different operational constraints for each specific vertex listed below.

Constraints

For treatment plant, outflow will be constrained by capacity.
For demand, the inflow must meet the demand.
For desalination, there is no inflow but the outflow will also be constrained by capacity.
For water pipes, we cannot have more water flowing through the pipe than what the pipe can hold.

The network of reservoir, tank, treatment plant, desalination plant, and demand can be captured by pandas dataframes representing the vertices and the edges.

In our data frame, each vertex is represented by a row. We track the attributes of upper and lower capacity, cost, start state (e.g. how much water is initially in the reservoir), rainfall, and demand. Recall that in the constraints, different types of vertices require different constraints. To make this easy, we track this in the nonbalance column(nonbalance is the category), using 0 for Treatment Plant, 1 for Demand, 2 for Desalination Plant, 3 for Reservoir, and 4 for Tank.

The edges dataframe captures how the different pipes are connected to different vertices.

As such, with the two dataframes above we can model the network.

Next, we import the pulp package to initialise the model,

import pulp# Initiates model
m = pulp.LpProblem(name = 'inner', sense = pulp.LpMinimize)

and construct the decision variables — amount of water to be pumped to various reservoirs, treatment plants, etc.

#Variable construction
flow = pulp.LpVariable.dicts("flow", ((out, into) for out, into in edge_ind.index), lowBound = 0, cat = "Continuous")

We index our edge dataframe by the origin and destination. This helps in the readability of the code later.

edge_ind = E.set_index(['out','into'])

Declarative-style programming

Now, we shall introduce declarative style programming, which is loosely speaking expressing in terms of what we want, similar to Structured Query Language (SQL) language. The declarative-style makes it:

Readable
Easy to debug

Again, recall that the example objective is to minimise the total cost of treating water:

There are two types of plants that treat water — the treatment plants and the desalination plants. We add the contributions from these two types of plants separately, using a hold variable to store the objective function. The nonbalance indicator helps us to track which type of plant it is. Notice that these are not numbers we are adding up at this point, but mathematical expressions!

Note: In this example, desalination plant is more costly than treatment plant by 5 units. As such, intuitively, we should expect the model to maximise the use of treatment plant as much as possible which will be helpful for our sanity check later!

Next, we code the different operational constraints for each vertex. For simplicity, we will only show a few constraints.

For treatment plant, the outflow has to be constrained by capacity.

For the pipe constraint, at any point in time, we cannot have more water flowing through the pipe than what the pipe can hold. As such, we can express this as (and added to the model m):

m += flow[(out, into)] <= capacity

We also do not want to pump too much water around the network, so we limit the total cost to be at most 2000 units.

eB = 2000

Fast-forward, we can solve the model. What happens here is that the model we have built is passed by puLp to an optimisation solver (in this case, the COIN Branch-and-Cut solver; CBC for short), which then tells us what the optimal solution is.

m.solve()
stat = pulp.LpStatus[m.status]

And we can see the results:

As you can see, the model chose to draw 50 units of water from the treatment plant to meet the 150 units of residential demand. The result aligns with our reasoning above as using the desalination plant is more costly! Also, notice that the reservoir A will pump 50 units to reservoir B because there will be an overflow when there is a rainfall of 100 units to reservoir A.

Of course, these are not the actual operational constraints but we hope this has allowed you to catch a glimpse of how our model is built!

Fun Fact: There are 17 reservoirs in Singapore.

Design Principles for Deployment

Having the model work is half the battle won, we still need to ensure that the model is deployed successfully for use. From the beginning, we adhered to these simple principles:

Making user interfaces consistent
Placing users in control of the interface
Flexibility of the model

These simple principles are designed with the end-user in mind so that users can comfortably and quickly master the use of the product.

We chose to build the user interface in Microsoft Excel as PUB operators are familiar with the software and can smoothly transit to our model. They can also control or automate the interface to their preference.

It is important to allow users to easily communicate with the model and understand what is happening in the solutions. As such, we build the model in a way that provides timely feedback to the user such as printing out the errors in the optimisation run. This helps operators to zoom in on the violated constraints and to start the conversation on how these could be mitigated.

To ensure flexibility of the model, we do not hardcode the network (in our code above, the network only requires the input files of the vertices and the edges dataframes). As such, operators can easily add new vertices and edges whenever there are changes to the existing network. They can even study “what if” scenarios, such as adding new plants or pipes.

Deployment Architecture

Our deployment architecture consists of 4 modular parts:

Inputs is the part where the operators will key in the start states of the vertices from the Excel files and specify certain operating constraints in the configuration file.
Data Processing is the part where we will format the data from the inputs in the right format before feeding into the optimisation model.
Modelling and Optimisation is the part where we will construct the network before feeding to the model using Jupyter Notebook.
Simulation is the part where we use the historical data to see if our model can return feasible results, which will serve as our sanity check.

Iterating Model During Backtesting Phase

We aim to build a minimum viable product as quickly as possible, by pursuing an iterative approach to collecting requirements, away from the traditional waterfall approach. As such, there is a need to continually backtest our model with past data to check if we missed out any other requirements. This will include both simplifications done in the modelling process and also underlying assumptions that operators made their decisions on.

During the backtesting, we noticed that our model failed to capture certain operating constraints. Initially, we wanted the operators to change the constraints directly from the edges file. However, adhering to the placing-users-in-control-of-the-interface principle, we decided to build an additional parameter in the configuration file that automatically controlled multiple operating constraints. By creating an easy-to-input interface, the operators needed to change one parameter, eliminating the risks of incorrect entry by changing multiple edges.

Learning Points

Through this process, we find that the challenge is not so much with programming the constraints but rather turning the operators’ procedures into the appropriate constraints without incurring unintended side effects in the algorithm.

Backtesting is not just about checking the sanity of the result by comparing the actual decisions with the model-generated decisions, but also about creating greater ease in the user interface and incorporating users’ feedback so that the operators own the model.

This process of delivery is as important as, if not more important than the model building phase. The smooth delivery of the model is heavily dependent on the following factors:

Close and direct relationship with the operators on the ground
Explainability of the model

Our proximity to operators on the ground allows us to gather their feedback directly, hence iterate more quickly. By explaining how the model derives the results, especially when the results are different from their current operations, operators can see how the model really solves their problems. The transparency of the model strengthens their trust and confidence in using the model.

Conclusion

So the next time you enjoy this beautiful scenery at MacRitchie reservoir, you can better appreciate the whole planning process behind-the-scenes!

This section was written under the assumption that you already know how much rain will fall and how much water will be consumed. If you would like to learn more about how we used Robust Optimisation to obtain a solution when we do not know the rainfall and consumption, please click here.

If you’re passionate about using tech to help the public, do follow us on Medium and join our team. We are hiring Data Scientists, Software Engineers, and DevOps Engineers. Send your resume to us at recruit@dsaid.gov.sg!