Automated Failure Analysis using the Machine Learning approach

Published in

Engineered @ Publicis Sapient

2 min readJul 17, 2023

This article is related to the time that we spend on Automation failure analysis, that our Automation Engineers spend on an average, and release that time to 70%–80%. Sounds interesting? Go ahead.

Problem Statement: Say, the team has 10,000 tests across different pods of Application that they are working on. For best case, say 90% passes and 10% fails, which is 1,000 tests to be analyzed, in terms reasons for the failures. Manually, if the team has to do this failure analysis, and if the team is experienced having the knowledge on the whole application, then roughly it may take 50 man days for one run. (Considering 20 tests are analyzed per day). This increases based on the fail count.

We explored market available technologies in automating these failure analysis, such that, there is no/less dependency on people, instead utilize that time for some other potential job, reducing the maintenance job. Unfortunately, we found nothing. Hence, we tried building the ML models that does this job for me.

The ML model is giving me 80%–85% accuracy now post training the model with multiple data. We initially have bucketized all the different failure types with the history of the data we have got and come across.

Now, the moment any testcase fails, the Model NLP will read the exception and group them to the respective bucket (like say testdata, authentication, performance, automation script issues, TimeoutExceptions, NoSuchElementException, StaleElementReferenceException...etc. etc.) depending on the failure, it goes and checks the API response till that point and read the stack trace, automation execution log of the test case using the NLP, and if there is any issue in the backend, in case of no issue, throws the issue to the respective bucket. It even checks the response time of the APIs which might have caused delay in reflecting on the webui and caused the failure. This logs would also help developers for better fixing the issue.

Some of the Python libraries used for building the model are: Spacy, CDP, Tesseract-OCR (template matching, image matching), jinja2, dist, numpy, pydantic…etc.

Post bucketizing the failures, it shoots out the email to the respective audience, say env issues would be sent to DevOps team, API issues, performance to Dev team, Automation script issues to Automation team. Now, as Automation engineers, we know where to invest our time efficiently in fixing those forever, and all these info is stored in history for future further reference. Hope this idea helps.

Automated Failure Analysis using the Machine Learning approach

Written by Naveenkumar B