Dynamic Workforce Scheduling (4) — Hardware Requirements

This article is part of a series in which I speak about my little happy journey in using an Artificial Intelligence Problem Solver for Dynamic Workforce Scheduling.

Hardware Requirements is an Open Experimental Point. But, here we will discuss some important points regarding it. However, it is still an open area for research and creativity.

Experiments and Observations

The method used by OR-Tools to check the distance has been called 3,357,812,293 times (for 40,000 sites but only 1 worker)

— Observation: It is not feasible to NOT pre-calculate the distance. The site-to-site distance has to be precalculated and early loaded in RAM, in a two-dimensional, array for fast access.

OR-Tools accept onlylong for any number. And Long needs 8 bytes (64 bits).

— Observation: This requires a very large memory for a large number of sites/facilities (more will be mentioned in the next section). The array size is (8 bytes * Facilities Number)². We tried using byte instead of long for saving the distances just at run-time and we cast to long at every read. However, this is not preferable, since casting a for trillions of times (or more) could affect the performance. Additionally, OR-Tools still uses long internally.

Conclusion: For a large number of facilities the OR-Tools needs lots of RAM to be used for the two-dimensional array that contains the distances between every two locations (if stored using long, the size in GB is: (8* locations²)/1024/1024/1024. However, all others were few GBs (5 GB) when experimenting with 40,000 facilities and only 1 inspector. And they could be much more for complex scenarios.

Multithreaded or Multi-Servers

Google AI Solver runs on a single thread. This is a limitation is by Google OR-Tools. But, hopefully, by business nature, the map was split into areas (centers). Therefore, I programmed the code such that, each time the Solver runs, it schedules the tasks only for one center at a time. So, hypothetically, we can run the Solver on multi-thread or multiple servers.

But running on multiple servers was more practical. Since the RAM seems to be the most resource in demand by the Solver. Because, as mentioned in the previous section, the Solver needs an array of distances between every site/facility and another. This is an n² array, where n is the number of sites. For example, if the number of sites is 10,000, the needed RAM for just the array of there distances from each other is (20,000 * 8 bytes for long Integer)² = 25,600,000,000 bytes. This is approximately 23.84 GB. (Actually, the number of sites in our problem could reach 40,000. And you may check the next section for helpful thoughts.)

So, I managed to run the Solver on multiple servers. Actually, this is done using background jobs that are arranged with the help of the SQL Server Transactions Isolation Level. Where each server, now, runs for a different Center (Geographically separated land segments).

Use less RAM

How to reduce the needed RAM size? As we found that one of the most consumable resources is the RAM, here are some thoughts about reducing the need for it by reducing the site-to-site array size.

Cut-off the unworthy half

The first fast thing we can easily do, and it is maybe mathematically safe, is to reduce the number of Sites to the half, or maybe less! This could be done by cutting off the half with the lower importance. i.e We can sort the sites/facilities, then remove the second half of them. After that, the array of the site-to-site travel time will be 4 times less. However, it is most likely not recommended to cut off lots of sites before processing. Since some of them may be part of the optimal solution due to there short distance form a selected one, for example.

Clustering into smaller areas

However, the centers can also be later separated into smaller areas. But, this could be not very helpful it was done arbitrarily. since it could lead to the problem of `local minimum` for some areas. This is except if those small areas where created wisely depending on some data clustering. And this possibly a research point: How to cluster to smaller areas, without casing a local minimum.

Sites inside one Building Blocks Considered as One

One great suggestion came from our GIC guy. Why not early group all facilities that reside inside one building block and handle them as they all reside in the same site. Of course, this may not be the best for accuracy. But this is very practical for shops that reside exactly near each other. Where the worker does not need to change his/here parking nor he/she needs to cross a street.

In this way, we will need to have a one-dimensional array that maps every site to a building block. And one large two-dimensional array for the distances from each block to another. And at runtime, every distance between two sites is calculated as follows.

travelTime = travelTimes[blockOf[a],blockOf[b]

Where a is the index of the first site and b is the index of the second site. And blockOf is the one-dimensional that contains the building block of each site. While distances is the two-dimensional array that holds the distance (in time) between every two sites.

The only drawback that I see for this, is that it may lead to some very rough estimation when the sites are inside the same block but on different levels inside towers. However, considering that it is already very challenging to get the in-building travel time, the result is could be very satisfactory.

Conclusion

While RAM seems to be one of the most demanding resources, in all cases, the CPU speed, the RAM speed, and the motherboard buss-speed are very important. However, the actual performance tuning is only to be truly discovered after running the Solver on some production data. This is because, while it is easy to determine the requirements for simple scenarios, it is not easy when there is a complex one like a real-life large business case for Workforce Scheduling, among the others (https://github.com/google/or-tools/issues/1071).

--

--