Amazon’s Serverless Iron Fist

AWS re:Invent is the “Victoria’s Secret Fashion Show” of cloud innovation. Every year, AWS parades on-stage, presenting feature after feature, all promises to disrupt the industry and crush the competition.

It’s easy to lose track with so many Big Data announcements. In the past year alone AWS has released Athena, Spectrum, Aurora Serverless, S3 Select, Glacier Select — and that’s on top of EMR and the mighty Redshift that has been rolling since 2013.

What makes each service so unique and disruptive?

  • Athena — perfect for analyzing data at petabyte-scale on S3, <1 sec. response time
  • Spectrum — Serverless features and billing on top of Redshift; perfect for petabyte-scale data on S3 when working on RedShift, <1 sec response time
  • Aurora Serverless — good for scaling relational databases, mid-range 100ms response time
  • S3 Select Serverless — perfect for binary data, fast extracting of partial binary objects from S3
  • Glacier Select Serverless — Querying binary data, fast extracting of partial binary objects from Glacier
  • DynamoDB Serverless — Veteran key/value solution, sub-millisecond key value serverless solution.

Why does AWS offer so many serverless Big Data services ?

Initially I thought that AWS’s goal was to shoot in all directions to identify its ace against Google BigQuery and to establish supremacy in the industry before Oracle 18C and Microsoft Serverless DB (TBD) catch on with data scientists & analysts. However, there’s another important reason for AWS’s Big Data frenzy, and that’s to elude the danger of “containers” for serverless Big Data.

Cloud providers love containers as much as they love their competition. While containers prevent provider-locking for, serverless Big Data might prevent provider-locking for serverless Big Data services. AWS foresaw this threat and is making their leading Big Data solutions available as serverless solutions, hoping that users won’t consider shifting from their non-serverless solutions to the competition’s serverless Big Data Solutions.

I believe that the future of serverless Big Data is entangled with the future of the cloud. Companies will use as many providers as it makes sense, business & technical wise, and with the increase in demand for real-time data querying (since AI is just getting started) there will be a need for multi-serverless databases support AKA “containers” for serverless Big Data.

The truth is that AWS really wants companies to use more than one of its serverless Big Data solutions, and my guess is that it will happen.

For now, AWS’s serverless products aren’t being taken seriously by other vendors (see my comments on Larry Ellison, CEO @ Oracle, latest keynote). Google BigQuery has a strong base of followers, a 6-year head start, and superior technology (see my comments on the latest BigQuery Alpha release ) — but who knows what AWS will reveal next to bridge the gap. These are exciting times for data analysts, and with all the new and soon-to-come announcements, 2018 will an exciting year for serverless Big Data.