A Technical Post-Mortem of the Vesta Arbitrum Launch

An analysis of our first product outage

PrimeDAO
PrimeDAO
6 min readFeb 14, 2022

--

In this note, we aim to provide an essential rundown of the issues that led to the Jan 31st incident experienced on Prime Launch during the Vesta Finance liquidity bootstrapping event (LBP). We want users and our community to understand why this event happened and what we are doing to prevent such events in the future. PrimeDAO aims to continuously improve and cannot evolve without a strong commitment to openness, honesty, and accountability. Embodying these standards is a minimum requirement for all positive and significant contributors in the Web3 space.

WHAT HAPPENED

On Jan 31, 2022, @ 8 PM UTC we initiated the treasury bootstrapping event for Vesta Finance using PrimeDAO’s Liquid Launch, our LBP interface.

Ten days before the launch, we took it upon ourselves to adopt the interface for Arbitrum One, a popular Ethereum L2. We incorrectly presumed that our prior knowledge about L2 was sufficient when accepting this initial timeline. A number of technical issues, one after another, followed upon initiation of the LBP:

  1. When the LBP started we received notice that Vesta was intermittently unable to access the LBP administration dashboard page due to the wrong wallet provider being used. The issue was quickly resolved in an interactive support meeting and access to the admin page was restored. The issue was a direct result of our miscommunication with Vesta.
  2. The LBP was erroneously displayed as paused in the interface despite the fact that the contract wasn’t actually paused on-chain. This resulted in Vesta users being unable to purchase tokens using our UI. While the issue was quickly resolved, it was a direct result of over-hasty migration to Arbitrum One by our team.
  3. We didn’t anticipate the need to adjust some critical parameters: the Prime Launch app was originally calibrated and coded for a much slower rate of block formation and way fewer blocks. This led to 1) intermittent inability to access the LBP sale interface and 2) trivial but disconcerting error messages, some of which were only partly resolved. All these issues were a result of our inexperience with L2.
  4. After we restored the LBP administration dashboard page, Vesta used it to pause the LBP. However, we failed to communicate in time to Vesta that due to Balancer’s LBP smart contract architecture, the pausing of a sale leads only to the pausing of swaps, while the algorithmically-adjusting swap rates continue their downward trajectory, unabated by sales that would otherwise maintain the token price. As a matter of fact, Balancer’s LBP smart contract does provide a function that allows modification of the LBP time window and weight. Since it represents a dangerously powerful function as it potentially allows a malicious admin to adjust key parameters, we designed our wrapper contract to block all such functions that affect the price during the bootstrapping event. This design improves the integrity and fairness of the launch but does not provide for the flexibility needed in extreme scenarios such as the one we encountered in this launch. In light of this event, it is clear that we need to rework our wrapper contract to allow an LBP to be fully pausable without negatively impacting the tokenomics of the bootstrapping event.
  5. Crucially, we realized that PrimeDAO’s technical network support was not sufficiently geared to provide enough people on-call during this kind of event.

In the early morning Alchemist’s Copper Launch offered assistance to move the launch to their interface. We are grateful to Alchemist and the Copper team for their support in concluding the bootstrapping event for Vesta Finance.

WHAT WE LEARNED

PrimeDAO is a whole made of individuals, squads, companies, human and non-human agents. As a whole, we aim to deliver work at the highest standards, taking joint operational responsibility for the systems upon which our partners build their products.

In the last 20 years, the world has seen increasingly complex software. A great part of this complexity is due to the inherent composability of open software that over time laid the ground for an increased fragmentation and componentization of systems. Today’s average product is made of many atomized, interdependent components. Despite providing unmatched advantages, this nature of present-day software has also created specific technical and operational challenges for teams and organizations of all sizes.

Whilst this is even more true in web3 and more generally in open ecosystems, this is not an excuse to condemn the complexity of our stack or to quibble about the inherent challenges of L2. When we build ecosystemic software we need to own the quality in an ecosystemic way. DAOs, as a whole with their partners, need to evolve towards a more shared and continuously integrated approach to reliability engineering and more specifically to capacity planning, without prevaricating around organizational shortcomings which are often the root cause of outages like this one. We are not exempt from this, and are already taking the following steps to course correct:

  • We will as a DAO adopt applied capacity planning protocols that rely very little upon or do not rely on uncertain estimation processes. If we cannot provide an estimate with confidence, it means we are far from being ready to roll out. This is particularly true when working on a new product in a new technical environment. Having our projects being value-driven and not scope-driven allows us to not rely on controlling time, but quality.
  • A strong communication protocol is being internally formalized across PrimeDAO’s edges, clients, partners, and end-users. It was alluring to welcome requests that seemed trivial (e.g. multi-layer support) but that took precious (and unforeseen) time and testing capacity from PrimeDAO’s development team. Strong protocols for boundaries serve to protect everyone and everything in the process. In the event of needing to pause a launch, we will prioritize excellent crisis response protocols to promptly inform all affected parties.
  • We will be more rigorous in testing our application logic in all technical environments. Especially when porting cross-chain/layer, it is essential to check that contracts and components from L1 make full sense in L2.
  • As our software gets increasingly interconnected we need to increase our ability to foresee and mitigate our risk related to 3rd party services.
  • In any current and future product, we will ship interfaces and protocols that maximize the autonomy and agency of our partners, especially in executing damage control operations (e.g. pausing the sale).
  • We formed a cross-functional team to learn even deeper from this event and identify architectural and operational risks to the emerging family of Prime products. We’ll be working in the next weeks with our engineering teams to focus on the prevention and mitigation of such disruption.

We strive to keep our Prime products outstanding, available and reliable, and in this case, we failed. This is our first experienced outage, and we fully acknowledge its severity, and we apologize to all of our clients, partners, and our entire ecosystem for allowing it to occur. As of this writing, the root causes of the outage are fully understood and tangible steps are being taken to fully prevent any risk of recurrence. Ultimately, our mutual, shared learning from this event will help render antifragile and more certain our organizational and development processes, resulting in a more vibrant, capable PrimeDAO ecosystem.

Finally, we are glad to emphasize that no users lost money and that funds were never at risk. It is our steadfast hope that we will rebuild and maintain the deep trust and reliance expected of us by our ecosystem moving forward, and in conclusion, we look forward to continuing to build production-ready products with our newly increased awareness of exactly what it takes to do so.

--

--