At Mixpanel, we heavily utilize Google Cloud Platform(GCP)’s SSD provisioned persistent disk (PD-SSD) to store the event data that underlies our service. We chose it to support our low-latency analytics database, which serves customer queries.
We found one major problem with PD-SSD: the cost. PD-SSD is 5x the cost per GB stored than Google Cloud Storage (GCS). Our infra team designed Mixpanel’s database, Arb, utilizing both PD-SSD and GCS; it stores recent, frequently changing data in PD-SSD for low-latency reads and much older, immutable data in GCS for a lower cost and acceptable latency.
As all engineers do, we first engineered for correctness/performance and later optimized for cost. PD-SSD initially held a large proportion of data for performance reasons, accounting for significant portion of our infrastructure cost. However, given that the vast majority of our data is immutable, we could shift the balance to GCS to drastically reduce our storage cost. …