AWS re:invent 2021 Experience — A Data Engineer’s perspective

First of all I am fortunate that I was able to attend another re:invent especially after Covid. The re:invent always gives me a very broader perspective on overall IT and how things are progressing at a rapid pace in the IT industry especially cloud computing.

The people that I get to meet in this conference, come from various walks of life, experience level and technical backgrounds, it is fascinating to see how all of us are so different yet same and working towards a common goal to improve the business processes efficiently. Enough of talks, here are my 2 cents about few of the sessions that I attended and focused on.

Overall:

Unlike the previous re:invents, this year was predominantly focused on upgrades to existing services and there was not any ground breaking, innovative new services that were introduced. It is a little disappointing and common feedback from all the attendees this year.

There were two main trends that I could see overall especially surrounding Data.

1. Most of the services going serverless (Redshift & EMR for instance).

2. Using AWS S3 itself as a data lake and trying to completely ignore the need of data warehouse? (Databricks Lake house and Dremio), more about this to follow.

Redshift Serverless:

1. If I remember right, one of main deal breakers when we were doing POC to choose between Redshift and Snowflake is the fact that Snowflake provides true decoupling of Storage and Compute and Redshift doesn’t.

2. Well, this year they finally introduced Redshift serverless which now requires no more clusters to be instantiated to have Redshift endpoint and data warehouse.

3. I believe this will be huge deal breaker who are currently deciding to move to Snowflake and the likes from Redshift.

4. Along with Redshift serverless, they also introduced several AWS managed features within the Redshift platform which were usually customer’s responsibility in the previous iterations.

5. Cost is now driven by RPU, Redshift Processing Unit where we pay only for the time for which the queries are running.

6. Introduction of RA3 instance types for running the Redshift workloads to improve performance, cost and decrease latency.

7. Enablement of running queries joining Spectrum, Redshift and Aura is a plus.

8. One thing that I did not like is the fact that the Redshift end point WILL CHANGE and the applications should make this change. Even though this is a simple change, it still involves going through the complete testing cycle for dashboards and reports like it is true with any changes in the IT cycle.

9. I tried to have a conversation with Snowflake folks who were in the booth and tried to understand how they see this offering from Redshift, needless to say that it is a direct painful competition to Snowflake. Here are their responses

a. People come to Snowflake not just for Storage/Compute decoupling, it is more about operational ease, customer support and innovations that they continue to do and focus on.

b. They do appreciate innovations by Redshift which they feel is healthy to the industry overall and they will continue to tread forward along with other competitors no matter what.

c. Even though AWS introduces a service, it does take time for them to be fully operational like we have seen for any of their other offerings in the past. So until then, at least customers will still try to stick to Snowflake and the likes and by the time AWS is ready and matured, Snowflake might have innovated more features which might help the customers to stick around longer.

EMR Serverless:

1. Like Redshift, they also introduced EMR serverless. Same concept, no need to maintain any cluster by customer rather it will be managed by AWS themselves.

2. It provides rather simpler experience as we don’t have to configure, optimize, operate or secure them.

3. We don’t have to bother about resizing clusters based on different workloads etc. Like Redshift serverless we only pay for when we run any workloads and not for times when it is idle.

4. Fine grain scaling meaning increasing and decreasing the resources at a finer grain as required enables even more cost savings.

Databricks and Dremio:

  1. As I mentioned in the introduction, I noticed a trend where several companies were trying to use AWS S3 directly as the data lake, in other words they were trying to avoid moving the data over to a data ware house itself and run high-speed, low latency queries on top of AWS S3 itself.

2. I know it sounds more like Athena, but the difference when I asked (at least this is what they said), Athena is more of an adhoc querying tool and not a full-blown enterprise wide data lake/warehouse tool.

3. Databricks came up with this concept called lake house as opposed to data ware house and they are coining the term delta lake wherein they claim it will be an open source project that delivers reliability, security and performance directly on top of your data lake.

4. Dremio is a similar offering which enables high performance BI directly on data lake storage and eliminate the need for data warehouses.

Apart from this from the data perspective,

· AWS Lake Formation — Made the cell level security GA and governed tables with automatic compaction.

· Amazon Kinesis Data Streams On-Demand — The Kinesis streams can now automatically scales the capacity in response to varying data traffic.

· DynamoDB — New dynamodb table class Standard-IA saves 60% of the cost.

Mainframe Modernization:

Apart from the above, as I come from Mainframes background during the my initial days of IT, the whole Mainframe Modernization as a managed AWS service caught my eye for sure. There is so much appetite from the Mainframe platform groups to find one efficient way to migrate the workloads to cloud. Hope this new AWS service will pave way or at least will be a good starting point for that kind of journey.

Cloudfix:

Bumped into the creators and marketing team of Cloudfix and liked their simple yet powerful product offering which mainly focuses on low hanging fruits that can right away give you cost savings as easy as in 5 clicks. Well, we have heard of many such services year after year in re:invent, but the thing that stood out for me is that they just dont identify the potential cost savings areas, but they also fix it, automagically for you. The marketing team of the company has done a stupendous job in grabbing everyone’s attention all over Vegas and make sure most of the attendees atleast attended their demo once. For a fresh startup that just started 6 months back, this accomplishment especially in reaching out to so many people so efficiently, is huge. Good luck.

Gaming:

I attended sessions pertaining to Gaming industry like one of them was from Sony where they walked through the process, pain points and challenges that they faced as they had to release the Play Station 5. Even though it is a major release after 7 years, the fact that the real testing could be done only in production because that is where they will have so many million users hitting their servers, it is quite an interesting challenge to face and they were explaining how they were able to do it by splitting their teams into heroes and villains, the villains would purposefully try to break the system so the heroes will have a true use case to handle such failures and discrepancies that they might have to face in production for real.

One thing I was hoping to see but couldn’t find any pointed sessions is about the whole AR/VR/XR and metaverse related topics. With the introduction of Metaverse by Meta (facebook), I was thinking that AWS will be ahead of the game by quickly introducing lot of services and functionalities around this topic, either there are no offerings yet or I missed it. But the fact that they came up with Blockchain services in 2019, hopefully they will come up with Metaverse related feature sets in the years to come.

Parties, Food and Fun:

Last but not the least, whats the point in re:invent without countless parties, absolutely tummy filling delicious food and lots of quirky events and fun activities. I have always found http://conferenceparties.com/reinvent2021/ super reliable and super helpful when it comes to planning my evenings and of-course nights. I am not even kidding that I had to have a separate spreadsheet built to plan it.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

HOW TO DELETE AMAZON ALEXA VOICE RECORDING HISTORY

Golang Test-cases on API Call Table Test

Building .Net Application Using CodeBuild Custom Windows Image

Setup Godot For Android Development In Ubuntu — Beginners

How to manage your business App?

Say yes to the Symfony4 Messenger Queue

The Video Experiment At OLX, Part 3 — Video Analysis Tool (VAT)

Website prelaunch checklist

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Harikrishnan Umapathy

Harikrishnan Umapathy

More from Medium

Why are Data Warehouses evolving to Lake Houses? Part3 — Removing silos allowing collaborative work

Implementing a Data Lakehouse architecture in AWS — part 1 of 4

5 Million Users a Day From Snowflake to Iterable

Amazon Lake Formation