AWS Elastic Beanstalk Worker environment deep dive
This article describes the AWS Beanstalk Worker environments. By using Elastic beanstalk, we can quickly deploy and manage applications in AWS Cloud easily. Since it is a fully managed Service, AWS is managing most of the configurations. Some of these configurations are not adjustable from the AWS console. If we have an understanding of its internal working, we can modify those directly in the internal config files based on the requirements.
Elastic Beanstalk provide two types of environments .Web Server environment and Worker environment. In this article, I am focusing mainly on the internal working of tomcat worker environments and its configurations.
What is Elastic Beanstalk Worker?
Elastic Beanstalk Worker is a managed compute environment for long running tasks. By using worker, we can decouple web application front-end from a process that performs blocking operations so that your application stays responsive under load. When we provision a worker environment, we will get SQS queue along with the compute worker environment .You can only access the worker through the SQS Queue.
What are the use cases of Elastic Beanstalk Worker?
A long-running task that significantly increases the time to complete a request including,
· Processing of images or videos
· Sending email
· Generating a ZIP archive.
· Complex logic taking more time to complete.
· Performing tasks on a schedule.
Important things to know
Worker environments are not accessible from outside world. You do not have an endpoint or URL to access this environment. The only way to access, this environment is through associated Simple Queue Serves (SQS) which I explained in detail in the below section.
In real scenario, you need to create a web tier (having a public endpoint) along with worker tier. So that the web tier can be accessible from outside and it can offload the long running process to the associated SQS queue of the worker.
Elastic Beanstalk Worker Workflow
Now we know the use cases of worker. If your application response time is more than few seconds, it is not recommend doing that in a real-time Rest API. In this case, we can use a Rest API to do the major things synchronously and send a response to client using web tier. The web tier also put the request into the SQS Queue associated with an EB worker. In case if the request is large, we can store that in S3 or DB and place a pointer to that object in the SQS queue. Note that SQS message size limit is 256 KB.
We can look into some of the important configuration of the worker environment. This can be accessible from worker configuration tab. (Configuration -> Worker -> Edit)
The above screenshot is the default configuration of a worker. I am explaining only the important concepts, which generally cause confusion.
The most important term related to the worker environment is SQSD (SQS Daemon) which plays a critical role in worker process. When we place a message into the SQS, the SQSD pulls the message and invoke http POST call to (http
://localhost/ on port 80) the application. By Default HTTP Path is root ( / ) so SQSD post message into / of the application. The application hosted in the beanstalk must include a rest service with / path.
Http Connections represent the maximum number of concurrent connections to the application. Default value is 50, which means if 50 messages are present in SQS, 50 requests run in parallel. If it has more than 50 messages in SQS, remaining needs to wait for the completion of other connections. For example at a time there are 60 messages in SQS, 50 will process immediately. However, remaining 10 needs to wait until freeing up the connections.
Visibility timeout represents the amount of time to lock an incoming message for processing before returning it to the queue. Once SQSD pushes the messages into the application, it waits for a 200 response back. During that time (application processing/response time) the message will be in flight and not accessible for anyone. Here Visibility timeout comes into play. Default value is 300 seconds, which means if the worker code took more than 300 seconds to process a message, it will be available in the queue again after this visibility timeout of 300 seconds and that will be processed by another connection. So always keep Visibility timeout 3 to 4 times higher than usual response time.
SQS Daemon (SQSD)
SQS Daemon is an inbuilt process that runs on a worker instance, which listens for messages in Amazon SQS queue. Once a message placed into the queue, SQSD (SQS Daemon) pulls the message and invoke an http POST method to http
://localhost/ on port 80. The body of request contains message from the queue. If application returns a 200 message back, the SQSD will delete message from the queue otherwise it will be retrying from the queue.
In the Elastic Beanstalk EC2 instance, we can use the following command to check the status of SQSD. (sudo as root)
service aws-sqsd status
And use the following commands to stop/start/restart SQSD manually. (sudo as root)
service aws-sqsd stop
service aws-sqsd start
service aws-sqsd restart
SQSD logs are present in /var/log/aws-sqsd/default.log. This can be used to validate the response status is 200 or not.
The configuration of the SQSD is present in the /etc/aws-sqsd.d/default.yaml. The configurations we see in the worker console is stored here.
Elastic beanstalk Tomcat8 Instance workflow
In Elastic Beanstalk worker tomcat instance, along with SQSD two more services are running — HTTPD and Tomcat8. Apache httpd act as a proxy server, which passes the request from SQSD to application running on tomcat instance.
We can check the status of the processes by using the following commands in the EC2 instance.
service httpd status
service httpd stop/ start/ restart
The Httpd server configuration files are present in /etc/httpd/conf/httpd.conf and the corresponding logs are present in /var/log/httpd/access_log. In case of any timeout from httpd, we might need to alter this config file in order to extend timeouts.
Similar way we can check the status of the tomcat processes by using the following commands in the EC2 instance.
service tomcat8 status
service tomcat8 stop/start/restart
The Tomcat server config files are present in /etc/tomcat8/server.xml and the corresponding logs are present in /var/log/tomcat8/catalina.out. In case of any timeout from tomcat, we might need to alter this config file in order to extend timeouts.
I appreciate your time; feel free to reach out to me if you have any questions.