Amazon Glue Local Setup — Test ETL Scripts Locally

Muhammed Musthafa Shahal V
Impelsys
Published in
5 min readDec 11, 2023

Introduction

If you’re new to AWS Glue and you don’t want to spend a lot of money on each ETL code you execute, or if you’re a developer who wants to check and confirm your ETL script locally before pushing it to AWS Glue, you can do a local setup and don’t need to pay for the GPU (Graphics processing unit).

Steps for Installing AWS glue Locally

Step 1: Download Docker

To run AWS glue locally make sure you have docker installed in your system and execute the below command in your system terminal.

$ docker ps
Fig 1: Terminal Output

If the Docker is already installed on your system, you will see the output as above image. If you don’t see the above output, head over to Docker Desktop: The #1 Containerization Tool for Developers | Docker and follow the normal installation procedure.

Step 2: Pull AWS Glue Docker Image

Once Docker is installed properly, open a new terminal and execute the following command to pull the AWS Glue Docker image:

$ docker pull amazon/aws-glue-libs:glue_libs_1.0.0_image_01

Depending on the speed of your internet, it may take a while to pull the image.

Fig 2: Glue Docker Image

After the download is completed, the above response is displayed.

Step 3:Configure AWS

To connect to AWS services through the local setup, continue with this step and also remember that the services that are not included in the free tier will be charged as per the use, including AWS Glue (for example accessing the AWS Glue catalog, etc.). If you don’t wish to connect or utilize AWS services you can skip this step and continue.

  • Further head over to the IAM (Identity and Access Management) section and select ‘Users’ in the AWS Access management console. Select the user profile or create a new IAM user profile.
Fig 3: Identity Access Management (IAM)
  • Now, you can select or Create an IAM user. In this case, I have created an IAM user. Click on the Add permissions, and select the option Create an inline policy.
Fig 4: Add Permissions
  • Select the option “Attach policies directly” as the Permissions option.
Fig 5: Attach Policies directly
  • Select “Glue” as the service and select the “All Glue actions” check box.
Fig 6: All Glue actions
  • Provide a name for the policy created.
Fig 7: Policy Name
  • Head over to Security Credentials and create a new Access key.
Fig 8: Create access key
  • Select Command Line Interface (CLI) as the key type in the next window.
Fig 9: Command Line Interface (CLI)
  • In the Description tag value section, provide a name for the Access key.
Fig 10: Description tag value
  • Make a note of the Access key and Secret access key or you can download it as a CSV file.
Fig 11: Access key and Secret access Key
Fig 12: AWS Credentials
  • Paste the Access key and Secret Access key in the terminal. Enter the default region, keep the output format as it is, and click Enter.

Step 4: Run the AWS Glue Image

Perform one of the below options:

  1. If you do not wish to use AWS services, then execute the below command.
docker run -itd -p 8888:8888 -p 4040:4040 --name glue_jupyter amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh

2. If you are running from a Windows terminal and want to integrate AWS services, run the following command.

docker run -itd -p 8888:8888 -p 4040:4040 -v %UserProfile%\.aws:/root/.aws:rw --name glue_jupyter 
amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh

3. If you are running from a Linux/Mac Machine and want to integrate AWS services, run the following command.

docker run -itd -p 8888:8888 -p 4040:4040 -v ~/.aws:/home/musthafa/.aws:ro --name glue_jupyter 
amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh

Ensure that the container is running by executing “docker ps”

Fig 13: Running containers

You will now see the container details in the terminal.

How to Check the Setup?

  1. After the docker ps command has listed down the AWS Glue container in the running stage, head over to http://localhost:8888/ from a browser to open Jupyter UI.
Fig 14: Jupyter UI

2. Click on the New drop-down and select Python3 to create a Jupyter notebook.

3. Paste and run the below code in the newly opened notebook to make sure that the setup is working as expected.

from pyspark import SparkContext
from awsglue.context import GlueContext

sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

4. Run the code by pressing the Shift + Enter keys and if everything is working as expected then the Jupyter notebook will run the code without throwing any error.

Fig 15: Python3 notebook

Conclusion

Setting up AWS Glue locally can be a great way to develop and test your data processing jobs before deploying them to the cloud. However, it is important to keep in mind the limitations of local development and to adjust your expectations accordingly.

--

--