Tracking Temporal Data at Robinhood

Jamshed Vesuna
Robinhood
Published in
4 min readApr 12, 2017

At Robinhood, accounting is a central part of our business. Properly tracking an account’s value and historical changes is vital for trading, reporting, compliance, taxes, and other auditing tasks. Specifically, we want the ability to make corrections to an account without losing information about its state prior to the correction.

Many data warehouse designs, such as slowly changing dimensions and certain change data capture schemas, track changes to real world facts and allow us to ask “What was the state of an account at time t?” However, we need visibility into any corrections or historical changes when auditing our systems. That is, we must be able to ask “At time r (in the past), what did we think the state of an account was at time t?”

A Bitemporal Database

We can describe an account’s balance as temporal data, or data that changes over time. To efficiently store and query this data, we make use of a temporal database. Here, we capture the time the data was believed to be true in the real world (let’s call this t) as well as the time the data was recorded in the database (let’s call this r). We require two axes of time in order to track corrections without mutating history, hence the name bitemporal.

Example

Say you deposit $100 in your account on 3/14. We can track the cash balance of your account using a bitemporal database:

This indicates that on 3/14 (record time), our database recorded that the cash balance on or after 3/14 (real world time) is $100. In other words, Q(t≥3/14, r≥3/14) is $100, where Q defines a database query within the range of t and r. Similarly, Q(t≤3/13, r≥3/14) is null, meaning our database currently (on or after 3/14) thinks your cash balance on or before 3/13 is null, which accurately depicts reality.

On 3/20, you purchase 1 share of ABC stock at $25. We add a row to our database:

However, now Q(t≥3/20, r≥3/20) is ambiguous as the database has two possible values. To resolve this, we cap the previous transaction time to 3/20 (exclusive):

Now Q(t≥3/20, r≥3/20) is correctly $75. Likewise, Q(t≥3/14, 3/14≤r<3/20) is still $100. This indicates that between 3/14 and 3/20 (record time), our database thought the cash balance was $100 anytime on or after 3/14 in the real world, which is accurate. Great!

Filling in the Gaps

But wait! What does our database currently (r≥3/20) think the cash balance was in the real world between 3/14 and 3/20? Q(3/14≤t<3/20, r≥3/20) is incorrectly null, creating a gap in our history. We know that between 3/14 and 3/20 in the real world, our cash balance is $100. So we fill this gap with the initial value of $100. Since we created this row on 3/20, we set the record time to 3/20:

Now, we have an accurate representation of reality:

Q(t<3/14) is null.

Q(r<3/14) is null.

Q(3/14≤t<3/20, r≥3/14) is $100.

Q(t≥3/14, 3/14≤r<3/20) is $100.

Q(t≥3/20, r≥3/20) is $75.

Corrections

On 3/21, Robinhood received a price improvement, indicating the execution for your 1 share of ABC was actually $10. We could just alter our last row to $90, but we would lose the fact that between 3/20 and 3/21, our database thought the cash balance was $75. Instead, we add a new row with record time 3/21 and real world time 3/20. Then, we cap the red row’s record time to 3/21 (exclusive):

Now, Q(t≥3/20, r≥3/21) is $90. Although we’ve created a historical correction to your account’s balance, we still store what our database believed to be true between 3/20 and 3/21. That is, Q(t≥3/20, 3/20≤r< 3/21) is still $75.

It is worth noting that a mere transaction log can provide the same immutable data integrity benefits of a bitemporal database, and in some instances, can be more compact and write efficient. A bitemporal database essentially captures a snapshot at each update, improving read performance.

A bitemporal database is a great tool for reasoning about changes to a value over time and even allows us to queue updates in anticipation of future changes. Many other data warehouse systems, such as payroll and inventory tracking, have similar auditing requirements and may find using a temporal database beneficial. If you are interested in designing and architecting systems, come join us!

Acknowledgments

Special thanks to Mayank Agarwal, Jason Clavelli, and Hongxia Zhong for developing these ideas and contributing to this blog post.

--

--