On the Origin of smartref: Proxy Pattern
Last year at CppCon2017, I was awarded the jury’s first prize for my poster about my smartref
library, because of the high quality and innovation of the presented approach B-). A question that I often got, during the conference, as well as on the Internet, was about when such a library would be useful.
In this series of blog posts, I will try to explain where this innovation might be useful. To start with, I will discuss a design problem we faced at work, namely how to cope with very large data sets.
Disclaimer: all the examples are written in a mixture of C++11/14/17. But it should also work with C++11 if properly rewritten. A pre-C++11 version would probably be possible as well, although the
smartref
library itself is written using C++11/14/17 features.
TL;DR
Imagine we have one million vector
s of 1 GB each. Keeping everything in memory is not possible. However, using a Proxy
class, we can virtually keep them in memory:
By inheriting from using_<T>
, the Proxy
class ‘uses’ exactly the same interface as the underlying type. Because of this, we can reuse existing code without an increase in the complexity. Furthermore, this interface is generated automatically, without having to write any boilerplate code.
Now, as soon as any member function is accessed, the data will be lazily loaded through the user-defined conversion function:
The problem
In order to illustrate the problem, as well as the solution, I’ve simplified it a bit and used a CarDealership
in the following code fragments.
Consider the following data model:
In the above example, CarDealership::showroom()
can be used to determine what cars are currently present in the showroom and therefore available for purchase. In case a customer visits the car dealership's website, it's good to know that showroom_
is already in memory, such that the customer can quickly get an overview of which cars are available. In practice, the number of cars is quite small, about 50 of them, so it should also easily fit in memory:
However, now consider that the CarDealership
wants to keep track of all the cars that have ever been sold:
Where previously we had only very small-sized showrooms, we are now dealing with something that will grow only bigger and bigger. This will of course have a major impact on the customer’s experience, as he now needs to wait until all the sold cars have been loaded into memory, which he is totally not interested in. Furthermore, it is a waste of memory and, depending on the number of cars sold, might not even fit in memory.
The Proxy Pattern
One way to deal with this issue is to move the large data out of the CarDealership
data model, and handle them separately. The drawback of this approach is that it would lead to a decrease in reusability, and an increase in complexity, since the developer now always has to think where it should get its data from and use the corresponding functions to process it.
Instead, we adopted a variant of the Proxy Pattern, more specifically one that allows for lazily loading the data from disk. This way, the small and large data sets are treated in exactly the same way, which results in much simpler and reusable code (compared to the previous approach).
Also, fully in line with C++’s motto “Don’t pay for what you don’t use”, we wanted a solution that would not require us to change the types of the small data sets that we were already using, which would not be possible with an inheritance-based implementation of the Proxy Pattern that you often see.
Leaving out many details, the Proxy
class we came up with looked something like this:
Simply by changing the type of the soldCars_
data member, we have added support for lazily loading them:
Now we can use the same code to visit the showroom, as well as check the cars that have been sold:
As soon as we pass the result from CarDealership::soldCars()
to some function that expects a vector<Car> &
, the implicit conversion operator is invoked, and the data is lazily loaded from disk.
Supporting generic code
The above example works because we are passing the Proxy
object to a function that already expects the underlying type.
Now consider the following piece of code:
This will not work, because a range-based for loop expects the dealership.soldCars()
expression to expose a begin()
and end()
function, which it doesn't have.
Luckily, this is very easy to solve by adding two forwarding functions to the interface of the Proxy
class:
We have fixed the range-based for loop, but also we have added support for a major part of the algorithms library of the STL.
We can now do for example:
Getting rid of the forwarding boilerplate
In the example above, the amount of work we had to do to implement the begin()
and end()
functions seems manageable. However, when you look at the full interface of vector
, you'll see that there's quite some work left to do before we can say that we support all the member functions, member types, operators, and free functions.
On top of that, implementing them correctly is non-trivial and error-prone, due to the const-correctness issues, SFINAE-friendliness requirements, etc.
One of the reasons that drove me to create the smartref
library was to remove the necessity to have to write them by hand. In this way, developers can focus on the problem that the class is supposed to solve, instead of providing the boilerplate.
Using the smartref
library, the Proxy
class can now be written without all the forwarding functions:
The Proxy
class 'uses' the public interface of the underlying class, by inheriting from the using_<T>
class, which in turn obtains the underlying object through the user-defined conversion operator.
Now, we truly get the full interface of vector
:
The Proxy
class is only one of the many use cases for the smartref
library. Stay tuned for more blog posts in which I will give more compelling examples, go into the technical details, and will give some interesting and unexpected use cases.