Using a Market Economy to Provision Compute Resources Across Planet-wide Clusters

Summary and Thoughts

In the paper Using a Market Economy to Provision Compute Resources Across Planet-wide Clusters, written by a team of researchers from Google and Stanford, the authors experiment with the use of a market to provision computational resources. These resources could include things like access to CPU cores, network bandwidth, or disk storage availability. The resources are localized to clusters, so that if storage is scarce in one cluster but plentiful in another, then the authors want users who need a lot of storage to migrate to the second cluster. They argue that allowing users to buy and sell resources in a market will incentivise this. However, the authors do not show that their proposed solution is practical, because the algorithm outlined in the paper does not always succeed.

First, whatever algorithm is used, it needs to scale well with an unbounded number of users. And second, as much of the resources as possible need to be allocated to somebody. For example, if you are a cluster owner and you want to assign compute resources, you want all of your machines to be fully utilized. If you have left over computational power, then it will just go to waste. It is worth noting that there are also constraints on costs being linear in allocation size, costs being clearly signaled, and that outcomes be fair for all users, meaning for example, that users who buy in bulk do not get a discount.

The clear solution is to use some kind of auction, where users can both buy more resources and sell what they have already been allocated. The authors mention that the Vickrey-Clarke-Groves mechanism is not appropriate because it does not scale well for real-time processing and does not produce the fairness conditions required. Instead, the authors argue for a modification on a ‘simultaneous clock auction’ called a ‘clock proxy algorithm’.

The ‘clock proxy algorithm’ meets the constraints for fairness and maximum allocation, but it does necessarily converge. The algorithm works by underpricing compute and collecting bids in successive rounds, increasing the price each time. The price stops increasing when there are more resources available than are being demanded by users. Each user is then allocated the compute resources they had bid for at the price they had bid. This would work well if all users were only bidding to buy for resources, but it is also the case that users can sell their compute back into the market for other users to buy.

It can be shown, although not in the paper, that if all market participants are only buyers or only sellers, then the algorithm will converge. However, there are cases where if some participants are buyers and sellers, i.e. traders, then the algorithm may never converge and the auction may never terminate. In the paper, this is only acknowledged by saying that further research is needed, and it is not discussed how this fact interferes with the authors’ stated goal of scalability. It seems reasonable that the auction operator would ban trading, but in my opinion, this was a missed opportunity to include an appendix about detecting trading behavior. In particular, I do not know if it is possible to detect the differences between trading behavior that converges and diverges, but including a discussion of the problem in this paper would have improved it greatly.

Abstract

We present a practical, market-based solution to the resource provisioning problem in a set of heterogeneous resource clusters. We focus on provisioning rather than immediate scheduling decisions to allow users to change long-term job specifications based on market feedback. Users enter bids to purchase quotas, or bundles of resources for long-term use. These requests are mapped into a simulated clock auction which determines uniform, fair resource prices that balance supply and demand. The reserve prices for resources sold by the operator in this auction are set based on current utilization, thus guiding the users as they set their bids towards under-utilized resources. By running these auctions at regular time intervals, prices fluctuate like those in a real-world economy and provide motivation for users to engineer systems that can best take advantage of available resources.

These ideas were implemented in an experimental resource market at Google. Our preliminary results demonstrate an efficient transition of users from more congested resource pools to less congested resources. The disparate engineering costs for users to reconfigure their jobs to run on less expensive resource pools was evidenced by the large price premiums some users were willing to pay for more expensive resources. The final resource allocations illustrated how this framework can lead to significant, beneficial changes in user behavior, reducing the excessive shortages and surpluses of more traditional allocation methods