Retrofitting Position IDs to Addepar Non-Intrusively

Calvin Wu
Nov 8, 2017 · 8 min read

Every day, Addepar’s data pipeline consumes portfolio data from hundreds of different custodians and imports it to millions of nodes and edges in our Financial Graph. To ensure that the data can be trusted and used to correctly calculate performance, it’s crucial that we not only import it in a timely fashion, but also run verification checks with what already exists on our platform. The data is complex, and data integrity check failures can be caused by a lot of different issues — for example, missing data, incorrect raw data from custodian, incorrect transaction mapping, problematic validation logic, and incorrect security mapping.

As our platform grows, we’re continually improving and extending our methods. The following is an example of how we addressed one type of verification problem — node matching — by non-intrusively introducing a core change to a critical pipeline.

Node Matching

When importing data, we need to do a few things:

  1. Identify the account node.
  2. Identify the security node via node matching. We search for the security node in our system using the security attribute information the custodian provided. The search criteria could be the name of the security or security identifiers. Each security type may have different preferred strong and weak identifiers that drive the matching. Our matching logic may evolve over time as we learn more about the data. This is similar to the entity resolution problem, where we have to identify the same node across different data sources.
  3. Find the edge. Once both the account node and security node are identified, the edge between them is where daily transactions and position snapshots are stored.

Ideally, the security nodes are unique and shared between the account nodes. After all, the edge is what references the client’s specific holding.

Financial Graph representation of a portfolio.

The Node Jumping Problem

Accounts referencing duplicate security nodes.

Having duplicate nodes isn’t great, but it’s not the end of world. It has no impact on the accurate representation of the client’s portfolio. What’s important here is the edge, where the position snapshots and transactions are stored. We need to be able to import data to the right edge all the time. The real problem with duplicate nodes is that it increases the probability of the system matching the account’s holding to the wrong security node, hence breaking the time series. This in turn causes unit verification failures and performance calculation errors.

A Node Jump for Account I on Day 4

Above is an example showing that Account I’s Facebook holding has time series data for three days. However, on Day 4 after the import, it only has one day of position data because the wrong security node was identified, which created a new edge. This causes the time series to drop off in what we refer to internally as the Node Jumping problem.

Finding the Right Edge

At a high level, the new workflow leveraging the new Position ID, goes like this:

  1. Every position from each feed has a unique Position ID. Ideally, this ID is the same over time.
  2. The system uses this ID to find the corresponding edge.
  3. If found, that’s where the data is imported.
  4. If not found, the system falls back to the existing workflow of node matching and updates the edge with a reference to this Position ID, so future imports can leverage it.

Position ID

Temporal Design

The temporal model also provides a complete history of the Position ID (PID) assignment of an edge. This is particularly useful when debugging any mapping abnormalities.

Time series of a Position ID.

Position ID Release

For the Position ID release, the known and unknown risks were very high because the change could directly impact the integrity of the client’s portfolio. Before we could roll out this feature, we first needed to understand the worst case scenarios:

  • Unexpected exceptions in the new flow causing imports to fail.
  • Weak Position IDs causing node jumping to appear again. If a Position ID changes frequently, matching can’t depend on the Position ID reliably, so it falls back to the existing node matching. The worst case scenario here is node jumping, which we know how to deal with already.
  • Buggy Position ID matching, which could cause massive data integrity check failures and post data clean up effort. This is a high-risk unknown. We would want a way to revert the change quickly and keep the impact to a minimum.
  • Other Unknowns. We need a way to revert quickly in case of an unknown failure that’s blocking data import.

Thorough unit test coverage and feature flagging goes a long way toward ensuring the quality of a release. Another important aspect here is test coverage. How we do ensure the test data set is meaningful and covers all the edge cases? The following captures the steps we took for this feature rollout:

  • Instead of a binary feature flag, we used multifaceted feature flags to give us fine-grained controls on when and how widely we release and to allow us to rollback with minimal impact.
  • We upgraded our staging environment to mirror production. In addition, we could replay production traffic in staging. This allowed us to test in full scale and have production quality test coverage.
  • We rolled out persisting Position IDs without using them for matching. This gave us a critical view on the quality of the Position ID chosen and allowed us to learn from this data. We’ve learned what Position ID combination worked well and what needed adjustment.
  • We rolled out Position ID matching by client. We started out with smaller clients or clients with less volatile instrument attributes. This greatly reduced the risk of the unknowns.
  • We gradually increased the roll out size with each release, and learned from any bugs that surfaced. This allowed us to keep the impact to a minimum.
  • We worked closely with our Data Operations team throughout the process. They work with this data day in and day out. Their knowledge of the data and feedback throughout this process drove the quality of the migration.

Releasing this feature was a lengthy process and it took a few iterations but with these careful measures, the release went out without any issues and we have been using Position IDs successfully to improve our data quality.

What’s Next

Build@Addepar

Addepar Engineering + Design Blog

Calvin Wu

Written by

Calvin Wu

Build@Addepar

Addepar Engineering + Design Blog