Accessing S3 and Redshift via API Gateway (using AWS Lambda)

Ankita Sinha
Nerd For Tech
Published in
5 min readJul 18, 2020

Today I am going to talk about how to get data from your website aand store it in redshift using AWS server-less architecture. This is probably the easiest and fastest way to start processing your data and build your backend quickly.

Let’s start with building a simple GET restpoint.

  1. Open the AWS console and go to API Gateway.
  2. Select the HTTP API and select BUILD. The HTTP api supports lambda and gives CORS support out of the box.
  3. Now give your api a name and we are good to go. For this tutorial, i will name it “test”.
  4. It will give you an option to have an auto-deploy route that deploys all your changes automatically. you can give it your own name or leave it as $default.
  5. Now just click create and an api will be created for you.

Now let's create the lambda function which we will integrate with our api gateway.

  1. Create a function from scratch.
  2. I chose language as python but you can chose any language you are comfortable with.
  3. Create an IAM role and give write and read permissions for — API Gateway, S3, Redshift, Cloudwatch.
  4. Lambda will automatically create a log group for you and upload all logs to cloudwatch.

Now lets create a route in your api.

  1. Click on routes on the left pane.
  2. Click create and choose “GET” api.

now that we have an endpoint, let’s integrate it with our backend(lambda).

3. Go to your routes now and click “GET”.

4. Click “attach integration”, select Lambda function and copy the arn of your lambda function here.

5. Click — Grant API gateway permission to invoke your lambda.

AAANNNDDD!!!.. you are good to go.

6. lambda by default returns “‘Hello from Lambda!” . So now lets go the code and change that to “testing testing 1 2 3” and save it.

your lambda function should look like this.

Your api should be automatically deployed to default stage.

7. Now lets configure CORS, so we can hit our API from any web browser.

You CORS configuration should look like this once your are done. We will be allowing all headers to hit our API for now.

Now that we have everything configured let’s try to hit the API.

You will find the url for your stage in STAGES on the left panel.

Let’s go to this url and you will see your “testing” msg 🥳🥳🥳🥳.

Now that we have set up our connection between lambda and apigateway, Lets start with saving our data in S3 and redshift.

Create a “POST” route similar to how we created our “GET” and integrate it with the same lambda.

Lets also create and S3 bucket and redshift table to store our data.

S3 bucket —

  1. Just give your bucket a unique name and you should be good to go. You dont need any more configurations. I named my bucket — ‘test-medium-ankita’.
  2. Now in your lambda , add this S3 bucket as a trigger, so that whenever any data is added to the bucket, it flows to redshift.

Redshift table

  1. Let our schema be. — String name, String age, String fav_cartoon
  2. Now open Redshift from the AWS console and lets create our table. (I am assuming you already have redshift setup. If not you can follow this tutorial.
  3. Run the following command to create a table.
Create table public.medium (
name VARCHAR(max),
age VARCHAR(max),
fav_cartoon VARCHAR(max)
)

4. Now when you do —

select * from public.medium; 

You should see your table. For my tutorial, whenever data is inserted into S3 bucket — “test-medium-ankita” , it should automatically populate your redshift table.

Once we have all our configurations ready, lets start with creating our lambda function.

You will need the following packages:

  1. psycopg2
  2. boto3

You will need to configure AWS on your local machine (create a user for this with all the required permissions mentioned earlier).

  1. Now create a virtual env. (We will need this to upload our code as a zip file to lambda. You can also follow this tutorial).

Code:

import json
import psycopg2
import boto3
def redshiftputdata(loc):
con = psycopg2.connect(dbname='***',
host='***',
port='***', user='***', password='***')
cur = conn.cursor()
# Begin your transaction
cur.execute("begin;")
cur.execute(
"COPY public.executed_user_rules FROM" + loc + "CREDENTIALS 'aws_iam_role=arn:aws:iam::***' json 'auto';")
# Commit your transaction
cur.execute("commit;")
print("executed user rules copy executed fine!")
def put_data_into_s3(data, bucket):
putbucket = bucket
key = "medium"
client = boto3.client('s3')
client.put_object(Body=data, Bucket=putbucket, Key=key)
return data
def lambda_handler(event, context):
print(event)
# For api gateway post.
response = "default"
if ("rawPath" in event and event.get("rawPath") == "/postnames"):
body = event.get("body")
dict = json.loads(body)
dict = json.dumps(dict)
print(dict)
response = put_data_into_s3(dict, "test-medium-ankita")

# handle data coming from s3.
if("Records" in event):
records = event["Records"]
for record in records:
s3_name = record["s3"]["bucket"]["name"]
key = record["s3"]["object"]["key"]
output = ""
if (s3_name == "test-medium-ankita" and key.find(".json") != -1):
loc = "'s3://test-medium-ankita/%s'" % (key)
output = redshiftputdata(loc)
print("output: ", output)
# TODO implement
return {
'statusCode': 200,
'body': json.dumps(response)
}

The function — “ redshiftputdata” — is used to COPY the json file received from S3 to our redshift table. And the function “put_data_into_s3” is used to put your data into S3 in a json format.

Now let’s do a post call and we should directly see data populated in our redshift table:

Body for call :

{"name":"Ankita Sinha","age": "15","fav_cartoon": "tom and jerry"}

Let’s now see the data in our table —

P.S. You can get lambda executable psycopg2 from here — https://github.com/jkehler/awslambda-psycopg2 . You just need to put this file in your code and zip it.

--

--