Time Compression, a 7-Year Plan
As a software engineer I often face challenges regarding consideration for balancing speed, accuracy and cost. One of the perpetual pain-points in my development is the consideration of time — often because traditional methods of timekeeping are too large/bulky or too fine-grained for some of my needs. Sometimes I just want to store some really rudimentary time information that’s useful but doesn’t require high accuracy or justify having all of those bits the data takes up.
I’m going to approach and discuss this from a database perspective since that’s the particular pain-point I’m trying to overcome right now. Full DATETIME fields are pretty darn accurate, future-proof and very stable but they’re also quite bloated coming in at 8 bytes of storage and requiring a fair bit of math to process. Unix timestamps are a reasonable compromise if you’re not too worried about future-proofing as they come in at half the size, only taking up 4 bytes of storage, and they’re much easier to process. But I want to really push it further, what can I get into just 2 bytes of storage for a timestamp and can I make it future-proof?!
Enter my two-byte solution for low-granularity timestamps: Short-Term Time. Here’s how it works: 7 years * 377 days * 24 hours = 61,656. We can store 7 years of data, up to 377 days per year (which covers leap years and long leap years) and 24 hours per day within a single 2-byte unsigned integer.
But why only hour level granularity, isn’t that a limitation?
Absolutely, it sure is a bottleneck. But there are all kinds of timestamps we’d like to store for metrics collecting purposes and down to the exact minute and second isn’t really terribly useful when we’re generating large reports and long-term projections. Down to the day and hour is fine for many metrics gathering needs. Further, oftentimes when we’re STORING metrics in the database, the table is just the user ID (4 bytes) and the thing they looked at (4 bytes) combined with the DATETIME (8 bytes) that they looked at it. Holy cow, the timestamp is taking up HALF of the storage and isn’t even the most important point of record! With the Short-Term Time field you’re reducing the burden on disk and in-memory to only 2 bytes of data per record — in a table with 1 million records you’re coming down from roughly 16MB of memory/disk usage to only 10MB — a savings of 37.5%! That’s pretty good data compression when you don’t need ultra-fine-grained timestamps.
Wait, so I can only store 7 years of information?
Yeah, this is the more ugly part. What makes it even uglier is that after 7 years you need to start ‘rolling’ the dates when the year flips in your database. This is a downside and cost of all this compression. So you need to condense all of your records for the 7th year into some other historical system (if necessary for your history), then delete all records in the 7th year, and increment the remaining records by a year (by adding 9,048 to every Short-Term Time field in your database). It incurs an annual cost, to be sure, and is part of the price of using this format. But it makes future-proofing fairly simple as the year count is backwards from the current year rather than a fixed point in time as its basis. If the data is pretty throw-away after 7 years, which is the case in my needed use, then you can just delete all of the 7th year data and roll down what’s left.
What about timezones and the user’s time and etcetera?
This scheme would have to be implemented within the local time of the database, which is already currently a limitation with many RDBMS’ or at least that’s typically how it’s approached as timezones and user time is most often better calculated on the client-side and not by the database. It’s standard and best practice for a database to use its own time as the master time of record, and convenient as it leads to a single source of truth.
Is it sequential, can I sort using the field?
Absolutely, it’s just represented as an unsigned integer field in the database and the time counts upwards from 7 years ago, and then upwards from there in day of year and hour of day. So a value of 0 (zero) would be Jan 1st, in the hour of Midnight, 7 years ago.