Diagnostic Based on Log Monitoring
Episode 1 — Structure and Functionality
What is a Log?
You can correlate a log with a dairy which used to write all actions of yours. Instead of combining all day actions and write about them briefly at night, you need to write log at the moment of action execution. For the readability and maintainability, logs are written in configured manner. Logs are mostly used by servers. Following image shows a part of WSO2 Identity Server log file.
What is Log Monitoring?
Log monitoring is a mechanism to tail the log file in real time.When an error occurred server usually prints the error message and error stack trace in the log file. So when we monitor the log file we can read the log line from the log file and analyse the error.
Why we need diagnostics since we had the error log line?
Most of the time analyzing error log line is not enough to diagnose the error. For example, let’s assume there is a thread death error occurred we can understand there is a thread death error by looking into the log file, but what caused the error can only be diagnose by taking thread dump.
For further explanation, please check following error line.
OOM error: java.lang.OutOfMemoryError: unable to create new native thread
Even though above error is Out of memory error, the cause of the error is server is unable to create new threads. Taking memory dump does not makes sense because the reason of the error is there are certain number of threads already created and halted in wait stage. This reason can only be identified when we analyse thread dump.
So the point is, monitoring log lines is not enough to diagnose the error and not all diagnostic artifacts are suitable for most errors.
I don’t want to talk about how to tail the log file here, you can read this article if you want to know about how to tail a text file with Non blocking IO.
From here I am going to talk about the tool I designed for diagnostics based on log monitoring for WSO2 Identity Server. Even though it is a general tool, I think explain the structure and functionalities of the tool in WSO2 IS perspective.
This tool is a simple java stand alone application which has less memory foot print. It designed to tail a log file and with give regex pattern in proper structure it can identify certain log lines and do the respective action execution.
Components and their Functionalities
Implementation of the Unix “tail -f” functionality, forked from Apache Commons IO project and improved by Serigo Bossa. I forked his repository and do some adaptation according to my requirements.
Match Rule Engine
This used to handle all the log line because it implements LogTailListener and it divides all log lines into two main categories. In IS perspective those categories are ERROR and INFO. Then it passes the error log line to Interpreter.
Interpreter has the major role in the tool. It first validates the error log line whether the error log line is valid for diagnostics. This validation is currently measured by length of the error log line. Then interpreter analyse the error log line and find what error it is. Then the interpreter start to do the diagnostic action executions.
But there is a problem, doing diagnostics for repeating error within certain period of time is not feasible, because some diagnostic actions are expensive in terms of memory and CPU cycles. So in order to avoid that problem i come with a concept of Reload time — a value in milliseconds which bound with action executors that once Interpreter use an action executor then the interpreter can not use the action executor again in the reload time.
These are the diagnostic functions like Thread dumper, Memory Dumper, Open File Finder,Net stat Executor. They do the diagnostic and dump it inot given folder.
Post Action Executor
Once all the diagnostics are done, Zip File Executor used to zip the dump folder and delete the actual folder for memory concern.
How the tool recognize the error?
The tool has a static memory as JSON file. I will write the json file structure and how i get the json file into Tree structure in the next blog.
Git hub repository of the tool is here.