Jump in the Pool

Jesse Allen
8 min readJun 8, 2017

--

Most of the time, we don’t have to think about the cost of new resources. Most of the time, we can grab a new one and trust the garbage collector to pick up after us. But, when resource allocation becomes our bottleneck, the careful application of an appropriate resource pool can make a big difference.

Do I need to pool resources?

No.

Or at least not yet. The techniques used to pool resources introduce complexity that can often make our code less readable, less maintainable, and more error-prone. It can be helpful to ask our usual bottleneck questions before deciding if we really need to pool resources.

Is the cost of this part of the system a significant share of the whole system?

As an example, writing to a logging package that zips and slices up structured logs to send over UDP might take up only a small portion of a request to our service if we are only logging errors, but a larger portion if we are also logging multiple events per request and not doing much else.

Part A is probably big enough, but part B might not be.

Is the cost of creating new resources a significant share of this part of the system?

With our example again, the cost of allocating new buffers to hold our zipped messages might be a large portion of an operation that is little more than zip, slice, send.

A.A is probably big enough, but A.B might not be.

If we answer yes to those questions, we should probably consider pooling resources.

How do we pick a strategy?

What are our constraints? We already know that allocation overhead in our operation is high. This alone is not enough to guide us. We could have high overhead because we have a small procedure that is called frequently enough to make normally negligible allocations into bottlenecks. We could have a procedure that uses a relatively expensive resource like a network connection or a temporary file that is called in spikes at irregular intervals. These cases represent the extremes of what we’re likely to face, and they require distinct approaches to pooling resources.

We can break these strategies down along two axes: the amount of idle time between uses for a resource, and the amount of fluctuation in the size of the pool. Is a resource likely to sit idle for longer than the time between garbage collections? Is it better to wait for a new resource to become available or to allocate a new resource when there are no resources left in the pool? Let’s take a look at four solutions that cover this space.

Quick Reuse in a Static Pool

If our operation sees heavy use, and the cost of a new resource is too high to justify allocating a new one when the pool is empty, one strategy we can employ is a worker pool. The following worker pool manages access to a bufio.Writer. While the example is completely useless as it stands, writing to a bufio.Writer is a useful stand in for a host of operations (encoding, zipping, etc.). We’ll try to use this same basic idea for all of our examples.

This approach allows us to allocate a buffer for the lifetime of each worker instead of only the lifetime of the Write call. All of the expense of allocation (in this case the byte slice in the bufio.Writer) happens before the worker begins accepting work. We’re also reusing the same writeResults variable. As we explore some of the other approaches to pooling resources, we’ll see the same basic pattern we have in the worker: prepare a new resource, get a resource for use, use the resource, prepare the resource for reuse.

// Prepare a new resource & get a resource for use
buf := bufio.NewWriter(uw)
var res writeResults
for arg := range ch {
// Use the resource
res.n, res.err = buf.Write(arg.p)
buf.Flush()
arg.resCh <- res
// Prepare the resource for reuse
buf.Reset(uw)
res.n, res.err = 0, nil
}

One important thing to note here is that, as this is implemented, the overall resource use is not static. Once all of the workers are occupied, new Write calls will begin to stack up, all blocked while waiting for an available worker. Play around with some ideas for solving that problem. What other issues come up as a result?

Eventual Reuse in a Static Pool

If our operation sees more sporadic use, where the idle time of a resource is likely to be quite a while, worker pools might be less appealing. Instead, we can keep our pre-allocated resources on hand in a free list.

This implementation and the worker pool implementation could probably be used interchangeably for most cases. Worker pools are more useful when used in concurrent pipelines, while free lists are more useful for managing expensive resources used sporadically. How would you solve the capacity problem from the worker pool example here?

Eventual Reuse in a Dynamic Pool

So far, we’ve been dealing exclusively with static pools of resources. The static resource pools make an important assumption: the cost of allocating a new resource is higher than the cost of waiting for one to become available. As a result, we have to pay special attention to what happens when we reach capacity. If we flip that assumption, and assert that the cost of waiting for a resource to become available (or holding on to enough resources to handle peak capacity) is higher than the cost of allocating a new resource, we arrive at a dynamic pool. Let’s turn that fixed-size free list into a dynamic one.

This is not much different than the fixed-size free list. It could even be primed in the constructor instead of allocating on first use. With this solution, we can have a much smaller pool. Instead of having a pool capacity greater than the expected peak use, we only need a pool capacity greater than an expected jump in use: the effective size of the pool is the number of resources in use and the capacity instead of just the capacity. This solves the problem of blocking when we reach capacity, but it loses the predictable resource use we had with the fixed-size free list.

This approach is especially helpful if the resource we’re using requires more than just garbage collection to destroy. A pool of open network connections might need to be closed when they are destroyed. A pool of temporary files might need to be closed and then deleted when destroyed. This kind of free list shows up in a lot of packages, and is described in Effective Go. Play around with this implementation.

Quick Reuse in a Dynamic Pool

In a small sliver of cases, we need a truly dynamic pool of resources that are likely to be reused before they would be collected as garbage. These resources are always temporary, but benefit from being reused a few times before they are destroyed. This is what sync.Pool was made for.

This is not drastically different from the dynamic free list. The major differences are that the dynamic free list can handle resources other than memory allocations (files, network connections, etc.), and the sync.Pool limits excess resource by time instead of count. When the garbage collector comes along, the excess resources can be reduced to nothing, but until that happens, there can be a massive number waiting to be reused or destroyed. Play around with this one too.

Potential Pitfalls

The techniques used to pool resources introduce complexity that can often make our code less readable, less maintainable, and more error-prone.

We noted that earlier, and while these examples don’t appear much more complex than a solution without pools, there are a couple of common problems that will come up.

Not Quite Clean

This one has bitten me a few times. Before returning a resource to the pool (or finishing your work for the worker pool), you must be sure that the resource is ready to be used exactly like a brand new one.

// reslice a byte slice to zero length (the capacity will remain)
b = b[:0]
// reset the bytes buffer (the capacity will remain)
buf.Reset()
// clear the contents of a file and go back to the start
err := f.Truncate(0)
...
pos, err := f.Seek(0,0)

I forgot to Seek on the temporary files in a pool once — files that were reused were uploaded to S3 padded with as many zeros as the previous use had bytes.

Not Quite Done

The most common uses for these pools is to reuse byte slices either directly or indirectly through a bytes.Buffer or other similar type backed by a byte slice. Unless we do a copy of our byte slice, we cannot return it to the pool until we know that every use of that slice, and the array that backs it, is complete. If not, we are likely to start overwriting the contents of the backing array. It’s best to avoid adding any additional layers of concurrency within the work here.

Not Quite There Anymore

One more pitfall to look out for is most likely to occur when pooling network connections or other resources that can be lost. A fixed-size free list or a worker pool could be severely impacted by such an event. We’ll have to check that the resource is still viable before we use it, and if it’s not, we may have to replace it.

Conclusions

There are several ways we can reuse resources. They all have distinct tradeoffs. They all share one common tradeoff: adding performance adds complexity. We are going to be tempted to throw pools at our problems, but we should avoid doing so. Instead, by familiarizing ourselves with the basic concepts behind these resource pools, we can build our simple solutions with a bit more forethought. Expensive resources almost always share the same basic lifecycle with or without pooling: prepare, use, cleanup. Keeping that lifecycle in mind as we build our simple solution helps us avoid common pitfalls and makes implementing a resource pool much easier and less error-prone.

A few more takes on resource pools:

--

--