Token Swap Service Outage Incident Report
Dear COSers,
During the past weeks, our token swap service (https://swapcos.contentos.io) experienced some temporary service outages. We have manually completed the impact fund transfer after the incident was reported. We are sorry for your inconvenience and have improved the service stability to mitigate the potential malfunctions.
What happened?
To achieve automatic token swap from Binance Chain (BEP2) to Contentos mainnet, a monitor process constantly queries Binance Chain for the incoming transactions to our swap recipient address. We leveraged the Go SDK provided by Binance (https://github.com/binance-chain/go-sdk)and implemented as its document suggested. (https://github.com/binance-chain/go-sdk/wiki/HTTP-API)
However, the query response somehow contains the unexpected condition, returning multiple transactions with identical TxHash but different values.
To ensure the fund safety, our swap program therefore suspended the following swap requests. This is the reason why the swap token service was shortly down.
Will it happen again?
After this incident, we are not sure if the transaction data from Binance Chain will always come in the expected format. Therefore, we not only report this issue to Binance representatives, but improve our service’s robustness in these 3 ways.
Fault tolerance. The duplicated transaction records will be merged if it makes sense. The ones with erroneous field values will be discarded. With this new error handling mechanism, we can enhance the service availability.
Monitoring. We’ve added another monitoring program to check if the swap requests are fulfilled within 30 minutes. If somehow the transaction cannot be verified in time, our developers will get notified and to inspect the potential issues.
Panic button. Should the error occur, the monitoring program not only beeps our developers, but it also turns the token swap website into the maintenance mode. Therefore, the users will know to wait until we fix the issues.
But we hope you will never have to see this error message.