On July 22nd, a routine maintenance check of the platform resulted in faulty configuration changes to a data store that exists as a shared layer between multiple services. This faulty config change rendered some services unable to perform asynchronous jobs. As a result, the daily price feed contract received a delayed update. The errant change was discovered 3 hours and 42 minutes later and the price feed was subsequently updated.
How did this happen?
To minimize the attack surface on the system, some details, including service names, will be purposely obfuscated but the principles will hold true. While the contracts are intended to be open and permissionless, they require manual intervention to stay up to date and as the deployers and primary consumers, we accept that responsibility until there are additional parties interested in helping us maintain our oracles. Knowing this, we designed a system with redundancies in place to help us update the oracle without requiring us to manually send transactions at the appropriate times. We will detail each of the redundancies below and how they failed:
- Horizontal deployments of this oracle updater service were all connected to the same faulty data store which caused the services to not spin up properly
- Additional jobs meant to inform us when updates were slow due to network congestion or missed altogether ran on the same service as the updater. Since the services could not spin up properly, they were never run
- Our services use application performance monitoring (APM) to alert us when there are excessive exceptions being thrown or latency increases. To profile the data store’s performance, the APM attempts to establish a connection with it but fails, resulting in the tool not capturing any data. This failure prevented the service from spinning up altogether so the above jobs were not run
- Our services have error tracking integrations to capture and triage unexpected behavior. This integration only loads after the APM successfully initializes, so since the APM failed, no aberrations were reported by the error tracker
What was the result?
The price recorded on the feed was $215.63 vs. the price that would have been recorded at the scheduled time, $215.93. As a result, the moving average price is $259.52 vs $259.53. The feed only allows an update once 24 hours have elapsed from the time of the last update. Therefore, the last updated time has been pushed back by 3 hours and 42 minutes.
The price was nowhere near the rebalance trigger price and this would not have affected the rebalances ability to complete at fair value.
Still, we have a responsibility to inform you every time our system behaves outside of expectations. Details on the oracles are currently scattered across our Set pages, GitHub repos, and articles. Over the next few product iteration cycles, we will bring these front and center.
What have we done to ensure this doesn’t happen again?
- Eliminated shared layers of data stores
- Created independent services for verifying expected state
- Updated our monitors and trackers to identify the absence of activity instead of relying on the services to report unexpected behavior
We are iterating on our price feed design to eliminate these sources of time drift. The price feed update can be called by anyone through Etherscan, but we will also be open sourcing additional tools for anyone to trigger scheduled updates for further redundancy. If you have any questions, please reach out to us on Telegram.