Serverless Dynamic Real-Time Dashboard with AWS DynamoDB, S3 and Cognito

As more Internet of Things (IoT) devices and sensors become available, and user level click stream analytics tracking is added to mobile applications and websites, there is a growing desire to understand and share what is happening in near real-time. As well as providing instant feedback on the activities that are monitored, allowing you to react more quickly as events are happening rather than the next day when the reporting data warehouse is refreshed. In addition, predictions and forecasts can be more accurate with fresh data.

A traditional pattern would be to build a web dashboard application integrated with a NoSQL database possibly with an in memory caching layer, both running on one or more servers. This gives you full control on the process and the data. The downside is that you need to maintain your own infrastructure, servers and code base. In addition, if the volumes of data ingested per second is very high then different scalable patterns need to be implemented.

Some of these issues can be addressed using Amazon Web Services (AWS) managed services like Amazon ElastiCache for in memory cache (Memcached or Redis are supported), Amazon DynamoDB as an NoSQL database that can reduce the maintenance and support requirements but you will still need to be build a dash board application and think about the deployment, security, scalability and data integration.

Another option is to make use of different 3rd party vendors that provide web analytics, data science, business intelligence or visualisation as a service. This can be useful to speed up development but there is an additional cost and you generally have limited flexibility, control over the process, and potentially face personal data protection, legal and compliance issues. Also with some vendors it is often very easy to send them your data but difficult to get back that raw data, leading to a vendor lock-in.

In this post we show how to create a dynamic dashboard that makes use of AWS managed NoSQL DynamoDB, scalable static website in Amazon Simple Storage Service (S3) and security using Amazon Cognito and AWS Identity and Access Management (IAM) Roles removing the need to run any app server, web server or use a 3rd party service. For the dashboard itself, we use charts.js which is an open source and easy to use JavaScript charting library, but others such as d3.js or dc.js could also be used. The code base is a mix of JavaScript and node.js web application which is lightweight, responsive, and event-driven. It is well suited for querying DynamoDB and we describe an efficient pattern to store and query time series data for near real-time visualisation.

This post assumes that the user has some working knowledge of JavaScript, node.js, has an AWS account setup, and is familiar with DynamoDB and S3.

Here is an overview diagram of the near-real time dashboard used in our data science and analytics RAVEN platform at JustGiving that we built on top of the AWS cloud infrastructure.

The near-real time dashboard in the RAVEN (Reporting, Analytics, Visualization, Experimental, Networks) platform on Amazon Web Services.

We now describe how you can create this dashboard in four steps.

1. Website and Application Hosting

We host all the required client-side code on a static website running on S3. The advantages are that S3 is highly resilient, scalable, simple to setup and requires no application or web server running to host a static website.

A detailed AWS guide on how to setup a static website is available here.

Static Website Hosting on AWS S3

You access the static website in your web browser using the endpoint such as myjgbucket.s3-website-eu-west-1.amazonaws.com. To make this more user friendly you can use Route 53 to map this endpoint to a custom domain name of your choice, e.g. http://www.mydashboard.com

We create two web page dashboards populated from data stored in DynamoDB: one will be a traditional table with row and columns of most recent data (like an Excel sheet) and the other will be a graphical line chart of the most recent data (like an Excel line chart).

Dependencies for the near-real time table dashboard

Create an HTML document with <HEAD><BODY> tags and with the following external libraries.

[ ... ]
<script src="https://sdk.amazonaws.com/js/aws-sdk-2.2.19.min.js"></script>
<script src="js/raven-table-dashboard.js"></script>
[ ... ]

We shall be adding the dashboard table JavaScript and node.js code to raven-table-dashboard.js

Dependencies for the near-real time line chart dashboard

Create an HTML document with <HEAD><BODY> tags and with the following external libraries.

[ ... ]
<script src="https://sdk.amazonaws.com/js/aws-sdk-2.2.19.min.js"></script>
<script src="js/Chart.js"></script>
<script src="js/raven-chart-dashboard.js"></script>
[ ... ]
<canvas id="canvas" width="400" height="400"></canvas>
[ ... ]

We shall be adding the dashboard table code to raven-chart-dashboard.js

Upload and make public the HMTL and JavaScript files to the bucket using the AWS Command Line Interface (CLI) or AWS Console. If you use the Console set Open/Download rights to any “Any Authenticated AWS User”.

AWS S3 Permissions

Note, this process needs to be repeated for any changes to the code so we recommend using the AWS CLI:

aws s3 cp $source_file s3://$target_bucket/$target_path
aws s3api put-object-acl — bucket $target_bucket — key $target_path — grant-read uri=http://acs.amazonaws.com/groups/global/AuthenticatedUsers

Once the dashboard is stable, you can add it to a continuous deployment process which deploys the code changes upon check-in.

For lower latency and resilience we recommend you deploy your own copy of Charts.js to the s3 static site too.

2. Security Layer

S3 Bucket IP Restrictions in AWS Console

Now that we have the files uploaded to the bucket with correct permissions, we want to lock down the bucket to a specific Internet Protocol address (IP address) range.

Under the Bucket Properties > Permissions click on Edit Bucket Policy and add the following JSON that only grants access to IPs specified under the Classless Inter-Domain Routing (CIDR) notation: Here the only permitted IPs are 1.1.1.1 and 2.2.2.2.

{
"Version": "2012-10-17",
"Id": "S3PolicyIPRestrict",
"Statement": [
{
"Sid": "IPAllow",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": [
"s3:List*",
"s3:GetObject"
],
"Resource": "arn:aws:s3:::myjgbucket/*",
"Condition": {
"IpAddress": {
"aws:SourceIp": [
"1.1.1.1/32",
"2.2.2.2/32"
]
}
}
}
]
}

In order to allow the code hosted on the S3 bucket to interact with other domains, we also need to setup the Cross-origin resource sharing (CORS).

Bucket Properties > Permissions > Edit CORS Configuration

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>https://s3-eu-west-1.amazonaws.com</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<MaxAgeSeconds>3000</MaxAgeSeconds>
<AllowedHeader>Authorization</AllowedHeader>
</CORSRule>
</CORSConfiguration>

Cognito Setup in Console

Cognito is designed to be used for mobile authentication but can be used to limit access to your AWS resources, here we want to ensure that only authorised users can read from a specific DynamoDB table.

AWS console > Congito > Create New identify pool

To simplify the setup process we shall assume your dashboard users do not need to be authenticated using a provider such as Google+ or Amazon, which is a reasonable as we have already restricted access by IP. For example users could be accessing the dashboard using their web browser in the corporate network which is in a fixed IP range.

In the next step you need to specify the IAM Role for “Your unauthenticated identities would like access to Cognito.”

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"mobileanalytics:PutEvents",
"cognito-sync:*"
],
"Resource": [
"*"
]
},
{
"Sid": "DynamoDBAccess",
"Effect": "Allow",
"Action": [
"dynamodb:BatchGetItem",
"dynamodb:DescribeStream",
"dynamodb:DescribeTable",
"dynamodb:GetItem",
"dynamodb:GetRecords",
"dynamodb:GetShardIterator",
"dynamodb:ListStreams",
"dynamodb:Query",
"dynamodb:Scan"
],
"Resource": [
"arn:aws:dynamodb:eu-west-1:111111111111:table/poc-raven-counters-event-minute"
]
}
]
}

Replace 111111111111 with your AWS account ID.

Once the Identity Pool has been created, click on Sample Code > Platform > JavaScript

Copy the “Get AWS Credentials” code and add the following code to files raven-table-dashboard.js and raven-chart-dashboard.js

// Initialize the Amazon Cognito credentials provider
AWS.config.region = 'eu-west-1'; // Region
AWS.config.credentials = new AWS.CognitoIdentityCredentials({
IdentityPoolId: 'eu-west-1:11111111-1111-1111-1111-111111111111',
});
[ ...]
AWS.config.credentials.get(function(err) {
if (err) {
console.log("Error: "+err);
return;
}
console.log("Cognito Identity Id: " + AWS.config.credentials.identityId);
var cognitoSyncClient = new AWS.CognitoSync();
cognitoSyncClient.listDatasets({
IdentityId: AWS.config.credentials.identityId,
IdentityPoolId: "eu-west-1:11111111-1111-1111-1111-111111111111"
}, function(err, data) {
if ( !err ) {
console.log(JSON.stringify(data));
[...]

//you can now check that you can describe the DynamoDB table
var params = {TableName: tableName };
var dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'});
dynamodb.describeTable(params, function(err, data){
console.log(JSON.stringify(data));
}

Now that we can access the DynamoDB table, we explain how the time series running counter data is stored for efficient retrieval.

3. Data Layer

The data presented in the dash board will be read from a managed and highly scalable NoSQL AWS database called DynamoDB. This post assumes that other services such as Lambda, Elastic MapReduce or EC2 Services already update DynamoDB with the latest data needed by the dashboard in near-real time. For example a set of Lambda functions could be recording a set of running counters for each hour of the day. Here is what some sample data between 2015–02–14 16:00 and 2015–02–14 19:00, for a particular “page view” web event we would look like:

The advantages in using such a representation for aggregate or running counts, is that it can be queried very rapidly to retrieve only the most recent rows, without having to scan the entire DynamoDB table.

//Generating a string of the last X hours back
var ts = new Date().getTime();
var tsYesterday = (ts - (hoursBack * 3600) * 1000);
var d = new Date(tsYesterday);
var yesterdayDateString = d.getFullYear() + '-'
+ ('0' + (d.getMonth()+1)).slice(-2) + '-'
+ ('0' + d.getDate()).slice(-2) + 'T'
+ ('0' + (d.getHours()+1)).slice(-2) + ':'
//+ ('0' + (d.getMinutes()+1)).slice(-2) + ':'
//+ ('0' + (d.getSeconds()+1)).slice(-2);

//Forming the DynamoDB Query
var params = {
TableName: tableName,
Limit: maxItems,
ConsistentRead: false,
ScanIndexForward: true,
ExpressionAttributeValues:{
":start_date":yesterdayDateString,
":event_to_find":eventToFind
},
KeyConditionExpression :
"EventName = :event_to_find AND DateHour >= :start_date"
}

This is a query that will fetch records using parameters maxItems, eventToFind, hoursBack from tableName. For example maxItems=100, eventToFind=impression, hoursBack=10 and tableName=poc-raven-counters-event-minute.

Note the tableName needs to be the same as specified earlier in the IAM role policy.

Now that we have the query, we shall run it and present the results as a table or chart.

4. Presentation Layer

Displaying the data as an HTML table:

//Query DynamoDB using the new documentClient
var docClient = new AWS.DynamoDB.DocumentClient();
docClient.query(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else{
document.write("<table style=\"width:100%\" class=\"pure-table\">\n");
document.write("<tr><th>Date Hour</th><th>Event Name</th><th>Event Count</th></tr>");
data.Items.forEach(function(item) {
document.write("<tr><td>", item.DateHour, '</td><td>', item.EventName,'</td><td>',item.EventCount, '</td></tr>');
});
document.write("</table>");
}
});

Here we query a DynamoDB table specified in tableName (e.g. poc-raven-counters-event-minute) and draw out an HTML table row by row.

Drawing the data in charts.js

var docClient = new AWS.DynamoDB.DocumentClient();
docClient.query(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else{
var recentEventsDateTime = [];
var recentEventsCounter = [];
var dateHour;
data.Items.forEach(function(item) {
dateHour = item.DateHour.toString();
recentEventsDateTime.push(dateHour.slice(0, -6));
recentEventsCounter.push(item.EventCount.toString());
});
//Chart.js code
var lineChartData = {
labels : recentEventsDateTime,
datasets : [
{
label: "JustGiving Pages Views",
fillColor : "rgba(151,187,205,0.2)",
strokeColor : "rgba(151,187,205,1)",
pointColor : "rgba(151,187,205,1)",
pointStrokeColor : "#fff",
pointHighlightFill : "#fff",
pointHighlightStroke : "rgba(151,187,205,1)",
data : recentEventsCounter
}
]}
var ctx = document.getElementById("canvas").getContext("2d");
window.myLine = new Chart(ctx).Line(lineChartData, {
responsive: true,
pointDot: false
});
}});

Here we create two arrays to store the time series data: recentEventsCounter stores the y-axis data and recentEventsDateTime stores the x-axis data. these are then added to the chart data object and charted using charts.js.

Well done you now have a flexible, scalable dynamic dashboard accessible within your corporate network without a server!

If you then enhance the code to support two queries on DynamoDB and make some enhancements to Charts.js labelling you can get something like the following.

Example chart generated using from synthetic data stored in DynamoDB, X-Axis is time in hour resolution running counts, and Y-Axis the count for page view and impressions.

Monitoring

The following can be used to monitor access to your dashboard

  • S3 Logs — a set of raw access logs that can be setup under S3 > Select bucket Properties > Logging
  • Cognito Dashboard which shows the different identifies metrics
  • DynamoDB > Tables > poc-raven-counters-event-minute > Metrics > Read Capacity will show some spikes of activity
  • CloudWatch > DynamoDB > Table Metrics > Consumed Read Capacity offers a more flexible view of the metrics

What next

  • Increase security, e.g.add an authentication provider for users in Cognito so that they need to login, lock down S3 CORS further
  • Reduce latency and increase security using Amazon CloudFront
  • Use CSS to make the table and page look pretty
  • Allow the user to change the parameters via the URL, fields or buttons
  • Run more than one query and add more than one line to the chart (as shown in our example line chart — be aware of the async nature of the queries)
  • In Charts.js override existing functions to better cope with a large number of data points (e.g. limit the number x-axis labels)
  • Try other more advanced charting JavaScript frameworks

Summary

We have shown how to quickly build a secure dynamic dashboard view on top of DynamoDB without a server and with a minimum amount of code. Making full use of the managed AWS infrastructure minimises maintenance, support costs, development time. We have also shown how to store and query time series data efficiently for near-real time charting.

We have used the following:

  • DynamoDB allows us to have a persistent store for the data
  • Cognito and IAM role deals with the authentication and authorisation
  • S3 deals with the hosting the HMTL, JavaScript and node.js code, making it low cost, scalable and secure
  • Charts.js deals with drawing the responsive line charts
  • JavaScript and node.js for authentication, querying DynamoDB, generating a tabular representation of the data and charting the data

References: