Creating alerts in Opsgenie from Java and through monitoring systems

Published in

adidoescode

10 min readMar 22, 2024

What is alert management.

For a long time, we’ve understood that managing alerts is crucial in the tech world. We take care of monitoring and responding to any event that could affect our operations. This critical step helps us detect and solve problems immediately, ensuring that our systems operate without interruptions. To make this task easier, we rely on a series of specialized tools.

Opsgenie stands out in our arsenal because it allows us to centralize alerts, automate escalation processes, and improve communication in incidents, becoming an integral solution for efficiently managing events in IT environments.

In addition to taking advantage of what Opsgenie has to offer, it is crucial that we know how to integrate it with other tools and projects. This is to generate events and for the final visualization of statistics. In this article we will explore how to integrate Opsgenie with our Java projects and how to combine it with other data monitoring and visualization tools.

Person on a computer while the outside is burning. — Image generated by Midjourney

What would management be like without Opsgenie.

Let’s imagine what the process was like for a company of the size of adidas without a system like Opsgenie.

When a critical incident arose, the alert and resolution process was manual and often chaotic. Alerts were generated through different monitoring systems, meaning we didn’t have a unified view of ongoing incidents. The task allocation to support teams was done manually, which delayed the response and increased the risk that critical alerts would not be addressed in time. Communication among teams was another pain point. During an incident, we lost valuable time searching for the correct information, determining who was available to solve the problem, and ensuring that all team members were aware of the latest developments. It was not uncommon for technicians, especially the newer or less experienced ones, to feel overwhelmed and nervous under the pressure to respond quickly without a clear procedure to follow.

Moreover, the management of incident documentation was inefficient. The creation of post-mortem reports and analysis of incidents to learn from them and prevent future occurrences was done in an ad-hoc manner, resulting in a slow and often incomplete process. This reactive approach prevented us from continuously improving our processes and systems. Life without Opsgenie meant we were constantly dealing with uncertainty, delays in incident resolution, and a general lack of visibility and control over our alert management processes. In short, it was an approach we could no longer afford to maintain if we wanted to ensure the high availability of our services and maintain our customers’ trust.

The adoption of a system to manage alerts marked a before and after in how our company approaches incident management. It allowed us to overcome these challenges through process automation, clear and efficient communication, and comprehensive and accessible documentation for the entire team.

What Opsgenie brings to project management

By integrating Opsgenie into our company, we have managed to centralize and automate the management of alerts efficiently, enabling us to respond quickly to any incident. The automation of escalation and the centralization of alerts have provided us with a clear and unified view of incidents in real-time, ensuring that critical alerts are assigned to the correct team without delays. This improvement in operational efficiency has been crucial in maintaining the continuity of our services and minimizing the impact on our customers.

Communication during incidents has notably improved. The platform facilitates immediate notification to the involved teams through various channels, keeping everyone informed and coordinated. Furthermore, the ability to document and analyse incidents directly in Opsgenie has strengthened our continuous improvement process, allowing us to learn from each incident and take preventive measures to avoid their repetition in the future.

How to integrate Opsgenie

In the realm of IT operations management, Opsgenie enhances our workflows by allowing flexible and robust integration with our existing tools through its REST API, which facilitates the complete management of alerts. Additionally, its SDK provides a direct way to integrate custom alert functionalities within our applications, optimizing event response and improving response coordination without creating an absolute dependency on the tool.

First and foremost, it would be beneficial to configure an integration API Key in Opsgenie. This API Key will allow you to identify where you want to read or write information in the case of having multiple accounts, teams, or projects. Next, I will explain the steps you should take to obtain it.

Step 1: In the top menu, select the Teams option.

Step 2: Add a new team or select an existing one. If you create a new team, you will need to fill in its information.

Step 3: With the team selected, choose the Integrations option from the left menu. Add a new integration and select API type integration.

Step 4: Select the API type and fill in the desired name.

Step 5: By completing these steps, a new API Key will be generated. You should copy this code into your application’s configuration.

Step 5: Obtain the API Key from Integrations

Step 6: To start the service, press “Turn on integration”. You will be able to verify that the status changes to “on”.

Step 7: (Optional) To verify that it works correctly, you can create a test alert from Postman.

Check the alert

Opsgenie SDK to generate alerts in your Java projects.

Once we know that we can send alerts using the API, we can connect in various ways from our Java project. The most generic way is to use a generic HTTPClient. However, to facilitate development, it is possible to use the Opsgenie SDK. To use the SDK in our Java project, it is necessary to integrate the dependency into our Maven file. These would be the dependencies:

<dependency>
    <groupId>com.opsgenie.client</groupId>
    <artifactId>sdk</artifactId>
    <version>2.13.1</version>
</dependency>
<dependency>
    <groupId>javax.ws.rs</groupId>
    <artifactId>javax.ws.rs-api</artifactId>
    <version>2.1</version>
</dependency>

In the case of javax.ws.rs, you must check if your project already has support for RESTful Web Services because many times it is included indirectly by other dependencies.

The first step is to configure an IOpsGenieClient with the necessary settings for initialization in the application.

/**
 * Configuration class for initializing Opsgenie client settings.
 * This class is responsible for setting up the Opsgenie client with necessary configurations
 * to interact with Opsgenie services. It utilizes application properties to configure the client.
 */
@Configuration
public class OpsGenieConfig {

    // Injects the Opsgenie API key from application properties.
    @Value("${adidas.opsgenie.key-api}")
    private String opsGenieKeyApi;

    /**
     * Creates and configures an instance of the Opsgenie client.
     * This bean method initializes the Opsgenie client with the API key provided
     * in the application's configuration. The configured client is then available
     * for autowiring throughout the application, facilitating interactions with Opsgenie.
     *
     * @return An instance of IOpsGenieClient, configured with an API key.
     */
    @Bean
    public IOpsGenieClient getOpsGenieClient() {
        OpsGenieClient client = new OpsGenieClient();
        client.setApiKey(opsGenieKeyApi);
        return client;
    }

}

Below, I show a service in Java with Spring Boot for the creation of an alert.

/**
 * Service class for creating alerts in Opsgenie. This class provides functionality to encapsulate the
 * process of alert creation, making API calls to Opsgenie based on predefined criteria such as error
 * detection or system monitoring alerts. It is designed to be used as a part of an application's error
 * handling and notification system.
 */
@Slf4j
@Service
@RequiredArgsConstructor
public class OpsGenieAlertService {

  private final IOpsGenieClient opsGenieClient;

  /**
   * Creates an alert in Opsgenie with detailed information about an encountered issue. This method
   * constructs an alert request with specified parameters like message, priority, description, tags,
   * and more. It then sends this request to Opsgenie and logs the response or any errors encountered
   * during the process. This is particularly useful for notifying operations or development teams
   * about critical issues that require immediate attention.
   */
  public void createAlertExample() {
    try {
      CreateAlertRequest request = new CreateAlertRequest();

      // Message: Main information about the alert, indicating what happened
      request.setMessage("Error processing message in QueueService");

      // Priority: Indicates the urgency level of the alert
      request.setPriority(PriorityEnum.P1); // (P1-P5). P1 for highest priority

      // Description: Detailed information about the alert, providing context
      request.setDescription("Message processing failed due to an unexpected error");

      // Tags: Used to categorize the alert, making it easier to search and filter
      request.setTags(Arrays.asList("error", "prd", "queue-service-processor"));

      // Alias: A unique identifier for the alert, used for deduplication
      request.setAlias("MessageProcessingError");

      // Entity: The entity that the alert is associated with (e.g., a server, an application)
      request.setEntity("QueueServiceProcessor");

      // Note: Additional information or context about the alert
      request.setNote("Initial analysis indicates a potential issue with message formatting.");

      SuccessResponse response = opsGenieClient.alertV2().createAlert(request);
      Float took = response.getTook();
      String requestId = response.getRequestId();
      String message = response.getResult();
    } catch (ApiException e) {
      log.error("Error creating alert.", e);
    }
  }
}

Running this code should generate an alert like the following:

Inside the alert, we can see all the information we have generated, including the notes that appear on the right panel.

For us, there are very important fields for the proper management of these alerts. One of them is the priority, which determines how serious an alert is. Properly using the priorities (P1-P5) allows us to urgently attend to the most critical environments or the most blocking errors. Another interesting field is the entity, which allows us to indicate which service is failing. In addition, to be able to filter information and even extract subsequent statistics, we make use of tags where we can store the service that failed, in which environment, and other types of information that can later be used as a categorizer.

Although the creation of alerts is the most common, it is possible to perform more management tasks. You can consult different examples at the following URL: https://docs.opsgenie.com/docs/opsgenie-java-api

Opsgenie Rest API to integrate with monitors.

Additionally, it is possible to integrate Opsgenie so that other monitoring systems like OpenSearch Dashboards, Instana, or Grafana can also send alerts when certain processes occur. For example, it is possible to create an alert if the response times in calling an API exceed a limit, or you can also set relative conditions, such as if a call has increased its response time by 500% compared to how it did last week. The advantage of using monitoring systems to send alerts allows us to reduce the source code of our projects and on the other hand, as historical information is stored in these monitoring systems, to be able to trigger alerts using samples over broad ranges of time and not just in a specific execution.

Flowchart (OpenSearch Dashboards + Grafana + Instana)

For example, in the case of OpenSearch Dashboards, it is possible to configure channels with the API endpoints to send the information.

Configuring OpenSearch Dashboards with Opsgenie

The next query in OpenSearch Dashboards allow us compare the response time of the last day to three days before.

{
    "size": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "@timestamp": {
                            "from": "now-96h/h",
                            "to": "now/h"
                        }
                    }
                },
                {
                    "term": {
                        "environment.keyword": {
                            "value": "environment-name"
                        }
                    }
                },
                {
                    "term": {
                        "application.keyword": {
                            "value": "service-name"
                        }
                    }
                }
            ]
        }
    },
    "aggregations": {
        "avg_execution_time_last_day": {
            "filter": {
                "range": {
                    "@timestamp": {
                        "from": "now-24h/h",
                        "to": "now/h"
                    }
                }
            },
            "aggregations": {
                "avg_execution_time": {
                    "avg": {
                        "field": "req.executionTime"
                    }
                }
            }
        },
        "avg_execution_time_96_day": {
            "filter": {
                "range": {
                    "@timestamp": {
                        "from": "now-96h/h",
                        "to": "now-24h/h"
                    }
                }
            },
            "aggregations": {
                "avg_execution_time": {
                    "avg": {
                        "field": "req.executionTime"
                    }
                }
            }
        }
    }
}

We can use the previous query to launch an alert if it exceeds the response time by 5. To do this we would use a trigger like the following:

(ctx.results[0].aggregations.avg_execution_time_last_day.avg_execution_time.value > 
 ctx.results[0].aggregations.avg_execution_time_previous_day.avg_execution_time.value * 5)

This is the reason why we must also consider monitoring systems like OpenSearch Dashboards as a source for generating alerts as well. It allows us to verify aggregated data from different metrics and generate alerts based on this aggregated information.

Conclusion:

In summary, efficient alert management is a pivotal element in maintaining system integrity and operational continuity within the fast-paced IT landscape. We have explored two primary avenues for creating alerts: directly through Java applications using Opsgenie’s SDK, and indirectly via integrated monitoring tools that provide a broader context and automated alert triggers. Throughout our discussion, we delved into the process of generating an Integration API Key, a crucial step for authenticating and authorizing interactions with Opsgenie’s system. We also examined a sample code snippet, illustrating the practical implementation of an alert creation within a Java-based environment. This comprehensive approach to alert management underscores the balance between proactive monitoring and dynamic response, ensuring our IT infrastructure remains robust and reliable.

The views, thoughts, and opinions expressed in the text belong solely to the author, and do not represent the opinion, strategy or goals of the author’s employer, organization, committee or any other group or individual.