Building A Real-Time Event-Driven Access Log System Using Docker, Python, Amazon SNS & SQS
Disclaimer
This content is part of / inspired by one of our online courses/training. We are offering up to 80% OFF on these materials, during the Black Friday 2019.
You can receive your discount here.
This post is the part II of a series of practical posts that I am writing to help developers and architects understand and build service-oriented architecture and microservices.
I wrote other stories in the same context like these are some links:
- MicroServices From Development To Production Using Docker, Docker Compose & Docker Swarm
- Easy Docker Orchestration With Docker 1.12, AWS EFS And The Swarm Mode
This article is also part of my book that I am “lean-publishing” called Painless Docker: Unlock The Power Of Docker & Its Ecosystem. Painless Docker is a practical guide to master Docker and its ecosystem based on real world examples.
In my last post (Benchmarking Amazon SNS & SQS For Inter-Process Communication In A Microservices Architecture), I tested the messaging mechanism using SNS/SQS and even if benchmarks was done from my Laptop (and not EC2 instance), results were good.
The last article was featured on many newsletters, so I decided to continue my tests and publish this post.
Event-driven architecture (EDA) (or message-driven architecture), is a software architecture pattern that promotes the production and the consumption of messages while evoking a specific event/reaction in response to a consumed message.
A classic system architecture will promote reading and reacting to data after saving it to a data store (mysql, postgresql, mongodb ..etc) but this is not really the best thing to do, especially if you are doing real time or near real time processing, unless you want to spend time and many building an instantaneous reactive system, please don't use databases, STREAM DATA INSTEAD.
I created two machines (you can use one for both publisher and subscriber, since it doesn’t change nothing in the networking)
This is the simplified architecture and I was the AB Load Tester. Both machines and services are hosted in the eu-west-1 region.
In order to minimize the transfer time, it is recommended to use the publisher and the consumer machines in the same region.
Load Testing ?
Let’s consider the example of a web server writing access logs to an EC2 disk.
In the first machine, I installed Nginx:
apt-get -y install nginx
For simplicity's sake, I kept the default Nginx page, our test is about networking, not an Nginx load test.
From left to right:
- in the first machine, I started the publisher container that will read the access logs and sends them to SNS.
- in the second machine, I started the consumer container that will read the data sent from SNS to SQS (it is directly connected to SQS service)
- in the third machine, my localhost, I have done a load test and as you can see I sent out 1000 requests with a concurrency level of 5
I used Apache Benchmarking for load testing my server:
ab -n 1000 -c 5 http://ec2-34-248-177-221.eu-west-1.compute.amazonaws.com/
Once again, my test is primarily about networking and data sent from :
publisher -> SNS -> SQS -> Consumer
If I wanted to test Nginx I will probably set higher the concurrency level.
This is another useful information about the request:
curl -I http://ec2-34-248-177-221.eu-west-1.compute.amazonaws.com
HTTP/1.1 200 OK
Server: nginx/1.10.0 (Ubuntu)
Date: Wed, 25 Jan 2017 22:17:27 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Wed, 25 Jan 2017 21:53:57 GMT
Connection: keep-alive
ETag: “58891e75–264”
Accept-Ranges: bytes
And of course my test:
Benchmarking ec2-34-248-177-221.eu-west-1.compute.amazonaws.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requestsServer Software: nginx/1.10.0
Server Hostname: ec2-34-248-177-221.eu-west-1.compute.amazonaws.com
Server Port: 80Document Path: /
Document Length: 612 bytesConcurrency Level: 5
Time taken for tests: 14.823 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 854000 bytes
HTML transferred: 612000 bytes
Requests per second: 67.46 [#/sec] (mean)
Time per request: 74.114 [ms] (mean)
Time per request: 14.823 [ms] (mean, across all concurrent requests)
Transfer rate: 56.26 [Kbytes/sec] receivedConnection Times (ms)
min mean[+/-sd] median max
Connect: 31 37 13.3 34 176
Processing: 32 37 10.2 34 141
Waiting: 31 37 10.2 34 141
Total: 63 74 17.7 69 209Percentage of the requests served within a certain time (ms)
50% 69
66% 71
75% 73
80% 75
90% 85
95% 98
98% 150
99% 180
100% 209 (longest request)
To run the publisher container I started my container log-publisher:
docker run -it --name publisher -v /var/log/nginx/access.log:/logs -e AWS_ACCESS_KEY_ID=xxx -e AWS_SECRET_ACCESS_KEY=xxx -e SNS_TOPIC_ARN=arn:aws:sns:eu-west-1:xxxx:test -e TAG=vm1 -e REGION=eu-west-1 eon01/log-publisher:latest
Same thing for the subscriber:
docker run -it --name subscriber -e AWS_ACCESS_KEY_ID=xxx -e AWS_SECRET_ACCESS_KEY=xxx -e SQS_QUEUE_NAME=test -e REGION=eu-west-1 eon01/log-subscriber:latest
You may redirect the output to a file since these two containers are made to be verbose.
Using Python/SNS To Create A Publisher
This is the primary code that I’ve used to publish any file mapped to /logs (from outside the container) to SNS and line by line using tailer lib.
Since Docker support environment variables, I used this feature to make my program use also the same variables that I used in the Docker Run command.
import boto.sns, time, json, logging
from datetime import datetimeimport os
import taileraws_access_key_id = os.environ['AWS_ACCESS_KEY_ID']
aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY']
region = os.environ['REGION']
sns_topic_arn = os.environ["SNS_TOPIC_ARN"]
tag = os.environ["TAG"]file_path = "/logs"logging.basicConfig(filename="sns-publish.log", level=logging.DEBUG)
c = boto.sns.connect_to_region(region, aws_access_key_id = aws_access_key_id, aws_secret_access_key=aws_secret_access_key)while 1:
for body in tailer.follow(open(file_path)):
subject = str(time.time()) + " " + tag
print str(time.time())
publication = c.publish(sns_topic_arn, body, subject)
Using Python/SQS To Create A Subscriber
This piece of code uses also boto in order to connect to the right SQS and print the date just after getting the sent message.
I used the same thing as Python/SQS for environment variables in this script.
import boto.sqs, time, json
import os
from datetime import datetimeaws_access_key_id = os.environ['AWS_ACCESS_KEY_ID']
aws_secret_access_key = os.environ['AWS_SECRET_ACCESS_KEY']
region = os.environ['REGION']
sqs_queue_name = os.environ["SQS_QUEUE_NAME"]conn = boto.sqs.connect_to_region(region, aws_access_key_id = aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
queue = conn.get_queue(sqs_queue_name)x = 0while 1:
try:
result_set = queue.get_messages()
if result_set != []:
message = result_set[0]
print str(time.time())
message_body = message.get_body()
m = json.loads(message_body)
subject = m["Subject"]
body = m["Message"]
message_id = m["MessageId"]
conn.delete_message(queue, message)
except IndexError:
pass
Benchmarking Results
I used Google Sheets to calculate the difference between the two timestamps :
- Time just before sending to SNS (H)
- Time just after receiving the message from SQS (I)
And this is the chart that show the time between I and J (J = I -HI).
The test lasted 14.823 seconds and during it 1000 requests were sent with a concurrency level of 5 requests. IMHO, these are good results as the highest response time was 0.28 second and the lowest was 0.009 second.
This is the distribution of different response times are below:
This another chart where I put the highest, the lowest and the average transportation time:
That’s all folks, part III is coming soon. For more updates, follow me using these links ↓
Connect Deeper
Microservices are changing how we make software but one of its drawbacks is the networking part that could be complex sometimes and messaging is impacted directly by the networking problems. Using SNS/SQS and a pub/sub model seems to be a good solution to create an inter-service messaging middleware. The publisher/subscriber scripts that I used are not really optimized for load and speed but they are a good use case.
If you resonated with this article, please join more than 1000 passionate DevOps engineers, developers and IT experts from all over the world and subscribe to DevOpsLinks.
You can find me on Twitter, Clarity or my website and you can also check my books and training: SaltStack For DevOps, Practical AWS & Painless Docker.
If you liked this post, please recommend and share it with your followers.
Don’t forget to check my training Practical AWS