Using Lambda, Comprehend, Dynamo, and S3 we can build an end-to-end news aggregator, sentiment analysis and statically generated website.
A proof of concept site shown below:
Dev Stack Rational
Major benefits of a serverless architecture:
- Fully Managed
- Low Cost
Fetching data, analyzing the text and then passing the data to a UI can be achieved in many ways.
More traditionally, a server would be running. Then at a certain interval, it would fetch data from the data source of interest. The would then access a third party sentiment API or run a sentiment algorithm locally. Lastly, the server would save the data to a directory that is available to the web server that is also running on the machine.
There are many problems that can arise here. Here are a few
- Web traffic affects all processes utility
- Sentiment analysis needs to be manually handled
- Cost of running sever while not updating
- The machine is prone to bugs — need patches
Starting from the bottom up. We run each of the functions on a time interval in sequence.
Fetch and analyze
This function does all the data pulling, sends it to comprehend and gets the sentiment analysis from AWS’s state of the art sentiment engine.
This function does the important task of checking for new articles and only processing the newly seen articles. This way we keep cost down from running sentiment analysis (AWS comprehend) on the same article twice.
PS: we need to check dynamo for all of the id's of the just pulled rss articles, we do this beacuase we only want to run comprehend on the new articles — comprehend is the most costly service in the architecture
This function reads our database and formats each record into a markdown file.
PS: articles are built from the data in dynamo and not directly from the rss feed because we want to be able to rebuild the articles (and not rely on the only copy of data stored as a markdown file in S3.
Build static site
This function packages up Hugo (a super fast easy to use static site generator). First, the lambda function gets all of the files needed from the first S3 bucket and then passes that into the Hugo binary. Hugo creates a static site and saves that to the publically available bucket. This is the final site.
PS: we use hugo because it allows us to build sites quickly and easily. Also because instead of creating a new HTML file for each article — we just generate a markdown file that hugo then compiles into a viewable article. This allows us to change the CSS for all articles easily and effectivly. Also since all articles are complied into static HTML, this is easier for search engines to index and better for SEO.
Doing it yourself
1. Setup S3 resources
we need two buckets called:
2. Setup Lambda functions
now we need to make three Lambda functions — code is in repo.
- hugo builder:
- article builder:
- article analyzer:
3. Authorize connections
lastly we need to make sure that all the functions can access the needed services.
- make iam role:
With execute lambda and S3 read/write access
With execute lambda, execute comprehend, write to dynamo and put events to cloudwatch
4. Setup CRON triggers
now that everything is in place, we do some final updates.
- copy out Hugo website to the input bucket.
- set up triggers for each function sequentially as a CloudWatch event.
Fetch data on the 55th minute, make articles on the 57th minute and run Hugo on the top of the hour. This way we have a website that will update every hour.
With the instructions above and the repo below, you should be able to create your own website powered by Lambda.
This project does sentiment analysis on a news feed. It uses no servers and only uses AWS services: Lambda, Dynamo…
POC project website: http://myexamplehugosite.s3-website-us-east-1.amazonaws.com/blog/
Thanks for getting through this adventure with me. Please show support and 👏 (clap) for this article.Remember you can clap up to 50 times, and clapping for this article would be really helpful in the Medium's algorithm.Any support is greatly appreciated ❤️