Data Engineer | Cloud

How to work with Iceberg format in AWS-Glue.

Cesar Cordoba
10 min readSep 6, 2023

As the official guide might be overwhelming some times, this post has been designed to cover all the main operations that one would want to do in iceberg such as create, read, write, update or alter a table. It also explains time travel and optimization in a simpler way.

We will go through several pyspark and SQL examples, therefore, this “guide” is extremely practical.

I hope you find it useful!

Side note. If you want to learn more about table formats and in particular Iceberg, check this post. Also, if you want to discuss anything, here you can connect with me www.linkedin.com/in/cesar-antonio-restrepo-cordoba.

Setting up iceberg in Spark to use glue catalog

apache iceberg logo

Important! In case you have worked with emr or glue and s3, you might be familiar to work with paths as “s3a://”. With iceberg we can forget about that (and actually you shouldn’t be using it anymore). When we initialize the cluster we will use the new s3-file-io which works with “s3://”. Don’t believe me? Check AWS (apache.org)

To work with iceberg in glue we need two things:

  1. Set the datalake-format parameter to iceberg.
  2. Initialize the iceberg…

--

--