Integrating an ML model into JIRA to gather feedback

Benjamin Audren (ELCA)
ELCA IT
Published in
8 min readNov 2, 2021

One of the main issues with Knowledge Bases is to maintain coherence: when there are more than a few hundreds of Q&A items, contradicting answers are highly probable.

For one of our prestigious clients, ELCA has developed a Q&A publication pipeline in Confluence and JIRA. To mitigate the incoherence issue, we used AI to identify possibly contradictory Q&As. Those proposals can be rated by the user via a feedback user interface. This feedback is then used to improve the ML model.

The goal of the model was to compare a new answer to the existing corpus and find if it contradicts any previously made statement. To achieve this, the model compares each sentence of the new answer and finds every existing sentence with a similar meaning, and every existing sentence that contradicts it.

Photo by Pietro Jeng on Unsplash

The basic workflow prior to the introduction of the tool consisted in receiving a new question in JIRA, performing a manual exploration of related existing answers inside Confluence, redacting and collaborating on the new answer, and finally publishing it to a large audience. With the AI model, we help the business exploring the growing corpus efficiently, and the system is more conducive to having several collaborators that do not need to know perfectly what already exists.

The list of existing answers where there is at least one similitude and one contradiction is a good start for an expert to determine if some corrections are needed. In this way, we use ML to augment the analysis power of a human expert, by helping him narrow his focus down on the relevant items in an evolving knowledge base.

The data preparation for this exercise was time consuming, and consisted in taking real sentences from the corpus and amending them to artificially introduce a contradiction, reformulating, or selecting neutral sentences. The analysis was done on Python using PyTorch, HuggingFace, CamemBERT, and Scikit-Learn.

The results were promising on the initial dataset, and the business wanted to evaluate its performance on real new answers. In order to gather feedback from the business on the classification’s pertinence, we interfaced the ML model with the live system as a micro-frontend, that writes a comment inside JIRA:

Comment inserted in JIRA based on the AI model analysis

Feedback icons are buttons to send a feedback to the ML system to store what the user actually believes about the sentences displayed. For each identified similitude or contradiction, the user compares the sentence from the original corpus vs one of the new sentences. This feedback can then be used later as an active learning step to correct his predictions.

It is this specific integration that we will describe here. While the details are specific to the JIRA system, we hope that the general plan can give you ideas on how to iterate quickly on a model in a different context.

Architecture

Global architecture

The architecture of the solution is presented above. The Q&A System (only the administrative part, handled by JIRA) is responsible for triggering the analysis (1). The response of the Python backend is asynchronous, as the analysis can take a few minutes to finish. It calls the callback URL provided by the first call, with the results of the analysis (2). The results are displayed as a comment in JIRA, and the users can then at their convenience submit feedback about the analysis by clicking on the links inside the JIRA comment (3).

The steps to go from the simple ML model to this architecture, as highlighted in the diagram above, are the following:

  1. Wrap the trained model behind a simple Falcon web server (a), with two endpoints: /answer and /feedback to analyze a new answer and send feedback from JIRA
  2. Serve it locally in HTTP with Waitress (b)
  3. Protect it with an authentication token and serve an HTTPS endpoint with NGINX (c)
  4. Add a new script fragment in JIRA to add a button (d) to call the /answer endpoint. JIRA fetches the new answer from Confluence (e), and attaches it to the payload.
  5. Add a new endpoint /mlcallback in JIRA (f) to receive the callback from the ML code and write down the comment (g), which links back to the /feedback endpoint on the Python backend.

All the customization in JIRA (point 4 and 5 above) is done using ScriptRunner.

Instead of describing in details the code, I will simply highlights the few points that presented some difficulties.

AI Backend

Use a ML model without changing the structure

One of the challenges when using some Python code that was not originally intended to be used as a package is to be able to call it in a simple manner, while still keeping its standard behavior. One particular problem is to handle global variables in the original script. The solution we used was to create a wrapper for the existing main python function (directly adapted from a Jupyter notebook). We wrote a wrapper.py file, alongside the original run.py, containing

We used the Namespace trick to call the main method in order to specify the global variables (I only show one such parameter in the example above for brevity). Note also the need to clean up the input text. It comes from a Confluence extract, and should therefore be cleaned from any HTML tags that could be present. Finally, we add a secret key to communicate with the JIRA endpoint, which allows us to leave the endpoint unauthenticated and simplifies the wrapping code.

Testing the system in isolation

Whenever integrating with a second system, it is important to have a way to debug the behavior of the system under construction in isolation. In our case, we need to send an HTTP request to a callback URL at the end of our processing. To verify our system, we simply added an environment switch to add a dummy /callback route to our Falcon web server. When deployed in development mode, we can therefore query the /answers endpoint and provide the /callback endpoint on the same machine - the results are then simply logged.

The main Falcon entry file is called app.py and looks like this:

The resources defined are standard Falcon resources. Note just for the Answer class the need to launch the process in a separate thread. We simply used a Thread from the threading library to return the status code for processing as early as possible.

NGINX Bearer Authentication

To simplify the deployment of this simple application, but to protect it from being accessed from unwanted parts of the network, we created a simple NGINX configuration. We created and used self-signed certificates, for which we had to compensate on JIRA side. We also defined a secret bearer token, that is shared with the JIRA code.

JIRA customization

HTML fragment

We start discussing about the actual user workflow. From within JIRA, a button is added (through a ScriptRunner Web Fragment) to request the analysis of a question. Each JIRA issue is linked to a Confluence page where the collaboration on the copy-writing happens. The button is hidden until the question has been redacted inside Confluence. When the user clicks on it, JIRA requests the content of the answer from Confluence, and sends a post request on the /answers endpoint defined above.

In addition to the body of the new answer to analyze, the callback URL containing the JIRA issue is sent. The secret bearer token is added to the headers.

As a note, if you are using self-signed certificates for the NGINX, you have to tell ScriptRunner to ignore them:

Display of the analysis and feedback

When the callback from the ML model arrives, ScriptRunner writes down a comment, impersonating a technical user. Early on, we realized that the model was finding a lot of sentences similar to or in contradiction with the new answer. In order to restrict the display of information and allow the user to focus on only the most relevant parts, we decided to display only the identified sentences for the questions where both a similitude and a contradiction were found.

For these questions, we show one line for each pair of sentences. Three buttons allow the user to send their impression of the matching: no relation, they mean the same, they are in contradiction. To avoid influencing the business user, we did not display if the model decided that the pair is a contradiction or a similitude. On the python backend side, these feedback are then stored, along with the pair id for identification. The next step would be to periodically take these new inputs and retrain the model with this new information.

The business spent some time analyzing some questions and systematically tagging the suggestions from the ML model. Unfortunately we realized that a lot of what the model identified as contradictions were simply unrelated sentences. It probably boils down to the fact that the training data was originally prepared on a much smaller subset of answers, that did not address the same range of topics than the newer questions.

Conclusions

It is possible to quickly embed a Machine-Learning model into a live system to augment the competence of its users. With the model described here, we present a way to flag existing content that may contradict a new answer, prompting possible adjustments to either side. On a multi-author platform with ever-evolving content, an intelligent system can drastically improve the quality of the information present and ensure that the communication with the outside world is as perfect as can be.

By being pragmatic, such a system can quickly be integrated within a live system. In addition, by involving business users, it is possible to gather valuable feedback about the model to continue training it to increase its pertinence. It’s a win-win situation, because in doing so, the business can identify links that may have eluded them otherwise. We believe that bringing value early to the business can trigger a virtuous circle of improvement for everyone. This was only made possible by bringing the feedback mechanism inside the existing workflow without introducing an additional user interface.

Resources

--

--