Data Transformation Pipelines in AWS (Part 2)

In case you missed it, check out Part 1.

Creating the URL to HTML Lambda

The first lambda works great, and is actually all we’re using for our original use case. However, uploading html files to S3 is a little clunky. It would be nice if we could provide a url and have the html automatically created and uploaded to our bucket.

Creating the project and the Lambda is almost exactly the same as for the previous lambda (same handler and role), but this time we won’t add a trigger. We’ll start out testing from within the lambda configuration area, and later set up an API Gateway to call it.

Here’s what our code is going to look like:

Lets test it with:

{ "host": "www.google.com", "path": "/", "fileName": "test-file.html", "bucket": "my-bucket" }

You should see the following output:

{ "Bucket": "my-bucket", "Key": "test-file.html", "Body": "..." }

And the html and pdf files should have been created:

Putting it all together behind an API Gateway

Select Services > Amazon API Gateway and click Create API. Select the New API radio button and give your API a name.

Next we want to add a named resource (from the Actions dropdown).

And a POST method which will call your lambda region and function.

Deploy your API.

If you click on the POST method under your develop stage you’ll see the url you can use to test the API.

We can throw this in Postman and see what happens.

Success! Let’s check our S3 bucket to make sure it made it there as well.

We can also change the path, like so:

Limitations

Since we’re using the http library to request the page from an AWS machine somewhere, there are a few limitations. For this project, the pages we’re interested in are being served on an internal network, which is not accessible publicly. However, even if it were public, it is still behind a login and we would need a way to create the request with a valid session token (essentially an intentional CSRF attack).

For now, we’re rendering a partial to string and sending that over to our S3 bucket that is set up to trigger the html-to-pdf Lambda. We then read from the bucket (with a retry mechanism in case the lambda isn’t instant) and stream the pdf bytes to the client.


Originally published at www.drivenbycode.com on February 17, 2017.