EMR With Graviton: Unlock Your Savings

Matt Weingarten
3 min readJun 7, 2022

--

Money!

Introduction

There are many different ways to explore cost optimizations when it comes to EMR usage. A relatively easy one, for starters, is using the latest instance types. AWS’s Graviton instances are worth checking out. Let’s dive in.

What is Graviton?

Sadly, we’re not talking particle physics when it comes to Graviton instances. According to AWS, Graviton processors were designed to deliver the best price performance for EC2 workloads. Currently in the second (or third in some cases) generation, they are widely extensible to all the major services that use EC2 in some capacity. EMR is no stranger to that.

Switching to Graviton

Switching to using Graviton instances is not too scientific. The instance types are exactly the same as the legacy EC2 instance types in terms of specs, so switching is literally as easy as changing your cluster configs accordingly. Of course, it’s important to test in a development environment to make sure the jobs succeed (and there’s no significant time loss) before deploying such a change to production.

We use a lot of R instances in our processes as we’re focused on memory optimization. Therefore, we switched over from the R5 series to the R6g series when we made these changes in our workloads.

Testimonial

As someone who wants to exhaust every avenue that exists when it comes to Cloud savings, trying out Graviton instances was something I wanted to do as soon as I read more about them. After testing and seeing that all our jobs were going to succeed without any issue, I deployed these configuration changes into production. We’re seeing cost savings across the board for all of our jobs (Graviton did mention potential time savings as well, but I’ve only really seen this with one job of ours; this is something that’s probably more code-dependent than anything), on the magnitude of close to $80k per year if all things held constant.

On top of some earlier optimizations we had made with Sync Computing’s autotuner earlier (which netted an additional $100k in approximate yearly savings), we have greatly improved our EMR-related spend for our workloads. These changes were easy enough and definitely are worth taking advantage of if you have the time to look into such tech debt.

Conclusion

With almost no hassle whatsoever, it’s definitely worth exploring Graviton instances and whether they can work for your EMR-based workloads (or anything that uses EC2, for that matter). Considering they’re cheaper across the board than the legacy instances being used before, you should have no problem finding savings by using them.

Next up for my EMR cost optimization exploring is EMR Serverless. This just went public as of last week, and it’s definitely worth seeing whether the serverless experience knocks out all the overhead of trying to manage configurations as we were doing before. I will hopefully be reporting back with the results of a POC in the next few weeks.

--

--

Matt Weingarten

Currently a Data Engineer at Samsara. Previously at Disney, Meta, and Nielsen. Bridge player and sports fan. Thoughts are my own.