Properly shutdown long running EJB threads during Wildfly application undeployment.

I’m hoping to save some fellow developers a little bit of time by sharing my experience with Wildfly undeployment problems that can be caused by long-running EJB threads. Let me start from the beginning and tell you a little bit about the situation that I encountered.

I was working on an application that uses a lot of cron tasks to periodically collect data from various services and store it in a database. These cron tasks are written in Java and are deployed to Wildfly as a Java EE application. The EJB @Schedule annotation is used to schedule these cron jobs and, within some of these cron jobs, there are asynchronous calls to methods annotated with the @Asynchronous annotation. The EJB framework is solid and worked great for scheduling these cron tasks, but we did have a minor issue with annoying, but harmless, error messages when the application was being undeployed. Specifically, the error was the “EJB Unavailable” exception. We knew that the reason we were getting these errors was because we were missing some @DependsOn annotations between dependent EJBs, and had some circular dependencies between services that needed to be refactored. When our application was undeployed, some of the EJBs were destroyed before the services that depended on them, and the long-running threads in those dependent services would throw this exception as they tried to continue using EJBs that were already destroyed.

Eventually, the task of adding the missing @DependsOn annotations and refactoring the circular dependencies rose to the top of my todo list. I expected it to be an easy task, but something unexpected happened after I had completed the code changes and deployed them to our staging Wildfly instance; subsequent attempts to deploy new versions of the application started to timeout with the following error:

ERROR [org.jboss.as.controller.management-operation] (management-handler-thread — 13) WFLYCTL0349: Timeout after [300] seconds waiting for service container stability while finalizing an operation. Process must be restarted. Step that first updated the service container was ‘full-replace-deployment’ at address ‘[]’

At that point, I was clueless as to what was causing the undeployment to time out. Apparently this is one of those issues that no one else on the internet has ever encountered before, or at least no one who ranks in any of the top 10 pages of google results, for any relevant search terms. Anyways, after some cumbersome debugging of the undeployment process, I found that my long running cron jobs from the EJB thread pool were still running after the undeployment attempts, and were seemingly preventing the undeployment from completing successfully. Prior to the refactoring, we did not have issues because these long-running threads were terminating by throwing EJB Unavailable exceptions.

As a side note, the jstack java utility is an indispensable tool for debugging thread activity within the thread pools managed by your application server. Simply pass the process ID of your wildfly instance to the jstack executable to get the stack trace for all of the active threads in your application.

jstack -F WILDFLY-PID > stack.txt

I always direct the output to a file and then use grep to find the lines that contain code within my application’s packages. In this case, I was immediately able to see all of my cron threads that were still running after the undeployment attempt.

Having finally discovered the link between the undeployment issues and the long running threads from the EJB thread pool, I began searching for more information about how these managed thread pools are shutdown when a Wildfly application is undeployed. I was once again amazed that my google queries returned almost no relevant information on the subject. Frustrated by the lack of documentation, I began experimenting with the application to see if I could deduce anything heuristically.

My initial presumption was that these threads would be interrupted by the application server when the application was being undeployed and that within my cron tasks I could intermittently check the thread’s interrupt flag in order to return early if necessary. Nope. The interrupt flag didn’t seem to be set when my test application was undeployed.

My next inclination was to use the @PreDestroy method of the EJB to set a shutdown flag that I could periodically check within my long-running cron tasks in order to shut them down early. It was basically the same approach as the interrupt flag, except I would be setting my own flag to indicate when the application was shutting down. After another ugly debugging session, I realized that @PreDestroy methods don’t get called until all EJB thread pool threads that were initiated by methods within the same EJB have terminated.

I’ll spare you from the remaining details of my failed attempts to find any documentation or empirical evidence to suggest how to solve this issue and instead I’ll just show you the solution. The best approach that I found is to bootstrap the EJB that contains your @Schedule and @Asynchronous methods with another EJB that is dependent on those. In the @PreDestroy method of the bootstrapping EJB, call a shutdown method on the dependent EJBs to set your shutdown flag that can be periodically polled within your long running threads in order to terminate early if necessary. Hopefully this tip spares one of you the time and headache that I suffered. Here’s some code to give you a more clear picture of the solution that I’m proposing: