Multidimensional Knapsack Problem — Distributed
In this article I want to present a distributed approach for solving different kinds of optimization problems. I will use Multidimensional Knapsack Problem (MKP) as an example. I hope you enjoy reading this :)
The MKP is a well known NP-Hard combinatorial optimization problem which can be defined by the ILP below:
We have n items and m resources. Each item generates a profit p(j) and each resource has a capacity of c(i). Each item j consumes an amount w(ij) from each resource i. The goal is to select a subset of items that maximizes total profit and doesn’t exceed resource capacities.
- Elixir — the underlying actor-based concurrency model makes Elixir a good choice for the algorithm described in the next section.
- CMPL — great open source mathematical programming language from COIN-OR. It’s used together with their CBC MILP solver.
The algorithm is designed for large scale MKP problems. For this kind of problems it’s impossible to find an optimal solution in reasonable time. A lot of different heuristics can be found and I want to present another one :)
The main idea of the algorithm is to split the original MILP problem into disjoint subproblems and solving them using the CMPL executable. The timeout is applied to each subproblem, so if a solution can’t be found in a reasonable time the subproblem is split again. Here is an illustration of the idea:
Some important parameters for the algorithm:
- initial divider — this number tells us how many subproblems we have at the beginning
- timeout — how much time the algorithm can spend solving the subproblem
- timeout divider — the number of subproblems created as a result of timeout
A few words about the splitting procedure. Let’s imagine a following problem:
- given 4 items [i(0), i(1), i(2), i(3)] and 2 resources [r(0), r(1)] with capacities of [20, 40]
Let’s say we want to split the problem above into 2 subproblems. As a result we have:
- first subproblem with 2 items [i(0), i(1)] and 2 resources [r(0), r(1)] with capacities of [10, 20]
- second subproblem with 2 items [i(2), i(3)] and 2 resources [r(0), r(1)] with capacities of [10, 20]
As you can see the splitting is very simple and intuitive.
The main benefit of having disjoint subproblems is a possibility to run them in parallel. And here Elixir starts shining. A transparency of concurrency model allows us to easily distribute the algorithm among a cluster of nodes.
The benchmark was run against examples taken from http://people.brunel.ac.uk/~mastjjb/jeb/info.html which collects data sets for different Operations Research problems. I’ve chosen the mknapcb9 set (30 examples) containing the hardest cases (500 items, 30 resources). None of examples has best solution found, but the best feasible solutions were given.
Parameters configuration depends on what we want to achieve. We can focus on a speed of the algorithm, on an accuracy or something in the middle. The benchmark was performed at 2 machines (2 cores per machine) with following parameters:
- initial divider — 8
- timeout — 30s
- timeout divider — 2
As you can see the benchmark results are pretty good. Nevertheless some improvements could be applied to make them better:
- applying a heuristic to splitting process — right now splitting process is really naive. It just splits items and resources from left to right.
- float dividers — right now dividers (initial and timeout) are integers. Changing this makes a subproblem’s size manipulating more precise.
- more machines with more cores :)
The idea of this article was not to solve the specific problem (MKP in this case), but to show a way to deal with optimization problems in general. Of course only a subset of optimization problems could be solved with the proposed approach, but I found it interesting enough to describe :)
The source code could be found here — distributed_mkp.