Instapaper Outage Cause & Recovery
Brian Donohue
59170

Been there: several years ago at a customer we hit the 2TB limit in ext3, although not in RDS.

Downtime wasn’t an option, so while preparing a long term solution (migration to XFS) the quick workaround was to:
* identify old data
* dump it
* delete it
Freeing 0.01% of the space was enough to bring the site up after just few minutes of downtime.

It is easier to tell a small subset of your customers that their data is _temporary_ not available, instead of having a so long outage affecting all your customers.

I also think that another option you had was to drop one of the secondary index. Performance will degrade, but reads are easy to scale out.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.