Resource Tags

Over the past month, I have been working on an API layer that deals with a large volume of small resources. In order to make this easy for my frontend developer to use, we developed a few conventions(and opinions). The first of which I would like to present is the concept of Resource Tags.

A Resource Tag, ReTag, is really just an extension of the an ETag, but it solves a very different issue. If you think about how ETags are used, they are effectively a cache key for a view. In most cases, your data and your views share a lot of parity. If you change your data, your view updates to reflect this.

ETag, Without Invalidation

This concept starts to fall apart on paginated collections. Since the ETag invalidates on the view, any insertion or deletion is going to invalidate all the following views. When this happens mid scan, you end up with a subsection of your data being out of date. Which could be pretty bad on its own, but you also now have an off by N error on all the following views. The reality is that you have two different incomplete views of the data.

ETag, With Insertion

On its own this shouldn’t be much of an issue, since each page returns metadata that includes the total number of pages and/or items, you can simply check if the item count has changed. Or at least that would work in testing. If something is added and then something is removed at about the same time, you might not notice.

And when you update a record? It will invalidate only the page that the record is found on, and your item count remains unchanged. Which means if you're not looking at that page, you have no idea that something changed.

ETag, With Updates

This is where the ReTag comes to save the day. So if you ask for a collection, and there's a change in that collection, then the ReTag changes. If you ask for an item, it changes with the item and not the view. The view can change, but the underlying data might be different. ETag allows you to cache invalidate based on views, and ReTag allows you to cache invalidate based on resources.

ReTag, With Insertion

So not only do you get to detect and prevent glitches in the Matrix, you now can sync that data pretty cheaply. Even with a basic go back to start and scan, you still get to leverage the ETag. You get to throw out a few cheap status 304s and move on with your day.

Things get a lot more fun if you can guarantee that it was an insertion or deletion. If you have a sorted list of immutable objects, you can binary search for the invalid pages and rebuild.

ReTag, With Update

When dealing with updates you don’t get to be as clever, but knowing is half the battle. The ability to know that part of your data is no longer valid allows you to look back and fix that. If you really wanted to make getting the updated records faster, you can always create a system to send you the changed records since date or ReTag.

Depending on how your records are sorted, you may be able to make certain assumptions. If you have a dataset that is interminable, you check the last page see if it's a new record. If no new record is found, then you know to start a search for a changed item.

Implementation is pretty straightforward. Since I am using Rails, I copied the cache_key method that's used to generated the ETag for each item view. Then I made sure that I could pass in a selection, to handle cases where the collection returned is not equivalent to all the records.