SQS, EMR, KINESIS
To understand SQS lets consider a scenario.
Suppose you are hosting a TV show where millions of people from all around the globe are supposed to vote at a same time. Traditionally, you would have deployed a web server to handle all the messages(votes) per second and you’d have to pre-provision for maximum expected workload.
I know that this is not a hard job to do but what if the traffic becomes too heavy for your server to handle and the server crashes or say, you didn’t get as much voters as you expected and thereby ends up overpaying for infrastructure.
You need queueing mechanism to handle such situation which can absorb 10 messages/sec or 10 million messages/sec with a flexible pricing model.
SQS or Simple Queue Service was the first service available in AWS that allows you access to a message queue that can be used to store messages while waiting for a computer to process them. This queue is fully managed by AWS and help you send, store and receive messages between software components at any volume without losing messages or requiring other services to be always available. AWS SQS can start with the help of the tools such as Amazon Console, command line interface, and SDK.
AWS SQS helps to send the unlimited number of messages and in any region. Message payloads consist of 256KB of text which can be in any format unless you use extended client library. If the text is of the size larger than this extended client library for Java Programming can use.
Following are the popular companies who use Amazon Simple Queue Service:
- Capital One
Create and read from SQS queue
Open the Amazon SQS console at https://console.aws.amazon.com/sqs/ and click on Create queue.
You finally see a Create New Queue page. You can create a standard queue or a First In/First Out (FIFO) queue. Standard queue is the default queue but if you want to create a fifo queue then you can select fifo queue and remember, to create a fifo queue, the name of the queue must end with .fifo
The two queue types are quite different, and you need to exercise care in choosing one over the other. You cannot change the queue type after you create it.
Next you need to define some of the key attributes in order to create a queue.
· Visibility Timeout — the length of time that a message received from a queue will be invisible to other receiving components.
· Message Retention Period — the amount of time that a message will remain or not get deleted from an Amazon SQS queue.
· Delivery Delay — after adding a message to the queue you can introduce a delay that for a certain amount of time that particular message cannot be picked by any component.
· Maximum message size — Message payloads consist of 256KB of text which can be in any format.
· Receive Message Wait Time — the amount of time that component will wait for a message before returning an empty response.
You can change the access policy according to your needs but for this demo we will proceed with default policy.
Leave everything as is and finally, click on Create queue
From the dashboard you can see that queue has been created.
Now that we’ve got a queue, click on send and receive messages from the top right corner and start sending message.
Type in the message under message body section. Add delay time and edit other attributes if you want.
Click on Send message
Now this message will be sent to the queue that you created and a message ID will pop up on your screen like this.
To view message, scroll down a little can click on poll for the message.
Amazon SQS begins to poll servers to find messages in the queue. The progress bar on the right side of the Receive messages section displays the polling duration.
Once the polling progress reaches 100% you can view your message under receive message section.
Click on the message ID to view the message body.
You can also view message details by shifting to Details tab.
Running a cluster infrastructure anytime you need to analyse big data, no matter what the terms of analysis are, is a waste of resources. Unless you’ve Amazon EMR.
Amazon EMR (Amazon Elastic MapReduce) provides a managed Apache Hadoop framework and other frameworks used in big data and data analysis that distributes computation of the data over multiple Amazon EC2 instances allowing developers to process massive amount of unstructured data.
configuring and provisioning on-site servers for big data computational tasks can be time consuming and expensive. So, what AWS did was it encapsulate all the infrastructure of the Hadoop framework into an integrated environment so you can spin up large cluster in minutes and getting your data processed according to your needs.
AWS EMR is easy to use as the user can start uploading the data to the S3 bucket. And the user can then configure and launch the cluster within minutes.
Amazon EMR can be used to build a variety of applications such as it provides built-in machine learning tools for scalable machine learning algorithms like TensorFLow, Apache Spark MLib
With EMR Notebooks, you will be provided with an open-source Jupyter based, managed analytic environment.
Amazon kinesis is an Amazon Web Service that provides a real-time, fully managed and scalable platform which makes it easy to collect, process, and stream gigabytes of data per second such as video, audio, application logs, from sources such as mobile clients, website click streams, social media feeds and events of all kinds. so that you can get timely insights and react quickly to new information. Netflix, monitors all communications between its applications using Kinesis, enabling it to detect and quickly solve any technical issues.
It has different sub-modules like,
· Kinesis firehose
· Kinesis analytics
· Kinesis streams
Amazon Kinesis Firehose enables you to load streaming data into the Amazon Kinesis analytics, Amazon S3, Amazon RedShift, and Amazon Elastic Search Services. Amazon Kinesis Analytics enables you to write standard SQL queries on streaming data. Amazon Kinesis Streams enables you to build custom applications that process or analyse streaming data for specialized needs.
With amazon kinesis, you can pass on the data to any other service on AWS and perhaps can use it as a pub-sub system as well because it’s very easy to get started and push data in without having to provision anything.
Amazon Kinesis comes into existence for various purposes such as fraud detection, live leader boards, and application monitoring even the streaming data coming from IoT devices such as embedded sensors, consumer appliances, and TV set-top boxes can be processed using Amazon Kinesis.
WELCOME TO THE WORLD OF BIG DATA ON AWS!!
If you find the article helpful do give it an applaud