How to build a serverless clone of Imgur using Amazon Rekognition and DynamoDB

Amazon provides the building blocks to create a fully functioning site on top of both AWS Lambda and S3 in a few simple steps

Published in

A Cloud Guru

6 min readFeb 25, 2018

In a previous article, we managed to build a very simple and somewhat primitive Imgur clone — using Amazon Cognito for registration and login before uploading images to the site for all to see.

Building a Highly Scalable Imgur Clone with Lambda and S3

So my previous 2 attempts at becoming a millionaire overnight have been resounding flops. Sure, I’ve managed to drum up…

hackernoon.com

Now, it had a few issues and these must be addressed before we go on to any funding rounds. We don’t want to scare away any potential investors with a few teething issues.

The issues preventing funding

Let’s go through the issues that need to be resolved prior to a round of Series A funding from any potential investors.

In order to render the home page, it would hit the s3 bucket storing all of these images and then return them as a big JSON list. No pagination, no smaller images. If this thing is going to scale in any real sense then this will have to be addressed. We will have to introduce a database and proper pagination of results.
It doesn’t really do anything “cool”. In order to address this, I thought I’d play around with AWS Rekognition and see if we could add some machine learning image recognition to the site. We can then browse images based on type should we so wish!
There were a couple of frontend things that could have been improved upon, like for instance, you can’t click on an image to view just that one image by itself. We need to add a single page that will fetch the image location and its tags from a database. I won’t cover how I fixed this, but feel free to browse the code which I link to at the bottom of the article!

Once we have addressed these we should hopefully be in a far better place to attract big-money investors. Our finished product after we’re finished with our updates should look something like this:

Notice the tags — these were generated using Amazon Rekognition

Introducing DynamoDB

Originally we did not have any form of database backing the image upload of our site. This may have been fine for an MVP — but going forward we are going to want to do things such as tag an image with the labels that rekognition’s detect_labels() returns.

Further down the line we are also going to want to implement a reddit-esque scoring system, and comments. So we need to set up a simple DynamoDB table that will let us do this.

I ended up throwing together a DynamoDB table with a quick Python script that I based off this tutorial:

DynamoDB — Boto 3 Docs 1.5.36 documentation

Guide

boto3.readthedocs.io

This featured just a `key` to start with. I figured we would need a key that would be an alphanumeric string, randomly generated and given back to the user similar to how imgur generates links except a little bit longer.

Our create_table function will look like this:

dynamodb.create_table(TableName='img-posts',KeySchema=[{'AttributeName': 'key','KeyType': 'HASH'}],AttributeDefinitions=[{'AttributeName': 'key','AttributeType': 'S'}],ProvisionedThroughput={'ReadCapacityUnits': 5,'WriteCapacityUnits': 5}
)

We can dynamically add and remove fields as we need them later on, this will include the tags that we are going to generate using some awesome machine learning.

What about GraphQL?

When I was creating this project, I seriously contemplated going down the AppSync route and using AWS’ managed GraphQL database to back my Imgur clone.

In future projects I definitely want to go down this route. To me, the way GraphQL can do things — such as return only what you need for a particular view — just makes sense. It’s definitely a very exciting technology and I am planning on building a search engine for my tutorial site with GraphQL.

The one, the only — Rekognition!

When someone on our site uploads an image, we want to be able to categorize this image based on its content so that people can view related images with minimal fuss.

We are going to create a lambda function that is triggered whenever a new item is added to the s3 bucket that supports the site. This will generate a UUID that will act as our key — it will then store the location and the generated tags for that image within our newly created DynamoDB table.

Note: I realise that this method of UUID generation may generate collisions, however, seeing as this is a simple tech demo project, I’m not 100% fussed. Seriously though, Dynamo, you need a decent UUID generation method.

With all of our image’s information now stored in a database, we can now easily paginate the results and display the tags within the frontend of our application. We should also be able to extend this further and store a thumbnail location further down the line.

The real magic here is the call to detect_labels() This takes in our newly uploaded S3 image and uses its underlying machine learning framework to detect what is in our object.

If we try and run this in the REPL against one of the objects already in a bucket, the output will look something like this:

This responds with a list of labels that feature a name and a confidence level. I’ve set the minimum confidence level a measly 40 so that some of the most obscure images still get a tag or two.

In order to trigger this we have to specify the event trigger within our serverless.yml file like so:

tagNewImage:handler: tagNewImage.tagImageenvironment:TABLE: ${self:custom.table}BUCKET: ${self:custom.bucket}events:- s3:bucket: ${self:custom.bucket}events: s3:ObjectCreated:*

How does it work?

Rekognition abstracts away from us, the developer, the complexities of machine learning and image recognition and simply exposes a very simple, beautiful API that we can run against images within an S3 bucket.

The service is constantly learning in the backend, so as our site grows, so to will the way it tags any images uploaded to it.

Future improvements

Whilst this project doesn’t quite replace Imgur, it does serve a very useful prototype as to how you could challenge a site like Imgur using pure cloud services.

In terms of future improvements, I really want to implement a comments section. I feel it wouldn’t be too hard considering I’ve got a user system and a database backing all of my images. I’ve also considered adding a rating system and may play about with this in the future.

Last but not least, it should be a proper domain name. I highly doubt I would be able to find a domain name that matches the succinctness of imgur — but it would be a nice thing to have in the future. I could also set up a nice CloudFlare based CDN to help reduce costs.

Thanks for reading

Hopefully, you found this article entertaining and somewhat educational! I’m hoping it shows you just how simple building a fully functioning site on top of both AWS Lambda and S3.

You can find the full source code for this project here: https://github.com/elliotforbes/imgur-clone

You can also find the deployed version of our application here:
http://imgur-serverless-clone.s3-website-eu-west-1.amazonaws.com/#/

Feel free to register and upload your own *appropriate* images to test it out!

I’ll be posting a lot of cloud-related video tutorials in the coming months, so be sure to subscribe to my channel on YouTube!

TutorialEdge

Hi Guys! Welcome to my channel, TutorialEdge! This is where I’ll be posting all of my video tutorials on all of the…

www.youtube.com

Thanks for reading!