Creating health resources for our APIs. Are your APIs “alive”?

Rolando Carrasco
Another Integration Blog
8 min readMar 20, 2024
Staying Alive — Bee Gees — 1977

Back in 1977, the Bee Gees released the song Staying Alive. Every bit of that song honored its title. Every piece of it you could feel it on the dance floor. You could say that the song is one of those that will keep you alive at the party so that your friends will notice that you are alive while you dance it.

But what about your APIs? Could you say that your APIs are health enough to be alive?

Let’s find it out.

When designing, implementing and operating APIs one of the key things is to have a way to obtain information about the health of such APIs. For example:

  • Is our API up and running?
  • Is our API serving requests properly?
  • How many successful responses is serving and from which type?
  • How many error responses are served and from which type?
  • On average how fast our API resources are responding?

Such analysis is normally/usually executed by an operator reviewing the API Platform or any other platform where the APIs are serving. What we are trying to highlight is that we usually need to wait for someone else to give us the answer.

Sometimes we incorporate a Health resource that just responds with a 200 HTTP code with some body content like:

< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 20
< Date: Mon, 18 Mar 2024 20:46:45 GMT
< x-correlation-id: 56456219-479e-4aa6-ba6d-e9447575128c
< Connection: Keep-Alive
<
{
"status": "OK"
}

Which could be useful, but is not enough or good enough. Maybe that response just implies that the server that is serving requests on that given port is OK and was able to serve our /health resource, but what about our APIs? Are they serving properly?

Sometimes we do not even include the health resource in our specs, we think that maybe that should be part of somebody else’s responsibilities. We do a good job creating or generating our API specs but without thinking on that very simple Health resource.

For this article we’ve created a sample API spec whose purpose is to serve a list of questions for polls and to offer the ability to input/create new questions for your polls. We have call it: Mulesoft Polls:

The spec looks like this:

#%RAML 1.0
title: MuleSoftPolls
version: '1.0'
baseUri: https://polls.apiblueprint.org/
baseUriParameters: {}
protocols:
- HTTPS
documentation:
- title: MuleSoftPolls
content: Polls is a simple API allowing consumers to view polls and vote in them.
types:
Question:
displayName: Question
type: object
properties:
question:
required: true
displayName: question
type: string
published_at:
required: true
displayName: published_at
type: string
choices:
required: true
displayName: choices
type: array
items:
type: Choice
Choice:
displayName: Choice
type: object
properties:
votes:
required: true
displayName: votes
type: integer
format: int32
choice:
required: true
displayName: choice
type: string
QuestionRequest:
example:
value:
question: Favourite programming language?
choices:
- Swift
- Python
- Objective-C
- Ruby
displayName: QuestionRequest
type: object
properties:
question:
required: true
displayName: question
type: string
choices:
required: true
displayName: choices
type: array
items:
type: string
/questions:
get:
displayName: ListAllQuestions
description: List All Questions
responses:
'200':
description: Successful Response
body:
application/json:
example:
value:
- question: Favourite programming language?
published_at: 2015-08-05T08:40:51.620Z
choices:
- choice: Swift
votes: 2048
- choice: Python
votes: 1024
- choice: Objective-C
votes: 512
- choice: Ruby
votes: 256
displayName: response
description: ''
type: array
items:
type: Question
post:
displayName: CreateaNewQuestion
description: You may create your question using this action. It takes a JSON object containing a question and a collection of answers in the form of choices.
body:
application/json:
example:
value:
question: Favourite programming language?
choices:
- Swift
- Python
- Objective-C
- Ruby
displayName: body
type: QuestionRequest
responses:
'201':
description: ''
body:
application/json:
example:
value:
question: Favourite programming language?
published_at: 2015-08-05T08:40:51.620Z
choices:
- choice: Swift
votes: 0
- choice: Python
votes: 0
- choice: Objective-C
votes: 0
- choice: Ruby
votes: 0
displayName: response
description: ''
type: Question
/health:
get:
responses:
200:
body:
application/json:
example:
value:
status: OK
/details:
get:
responses:
200:
body:
application/json:
example:
value:
!include /sampleDetails.json



As you can identify we’ve incorporated a resource /health with a child for /details.

If we do a GET for /health we will receive:

{
"status: "OK"
}

But for the /health/details, the response is like this (sampleDetails.json):

{
"success": true,
"error": null,
"response": [
{
"time_series_60m": [
{
"2024-03-16T18:00:00.000Z": {
"count": 1,
"aggregations": [
{
"status_code": [
{
"404": {
"count": 1
}
}
]
}
]
}
},
{
"2024-03-16T20:00:00.000Z": {
"count": 4,
"aggregations": [
{
"status_code": [
{
"200": {
"count": 2
}
},
{
"404": {
"count": 2
}
}
]
}
]
}
},
{
"2024-03-17T05:00:00.000Z": {
"count": 2,
"aggregations": [
{
"status_code": [
{
"200": {
"count": 1
}
},
{
"404": {
"count": 1
}
}
]
}
]
}
},
{
"2024-03-17T14:00:00.000Z": {
"count": 7,
"aggregations": [
{
"status_code": [
{
"404": {
"count": 7
}
}
]
}
]
}
}
]
}
]
}

This is the normal message that you will find using the Analytics API directly from MuleSoft, we are using it to implement our resource. That API is the one that the Anypoint Console uses to serve this information:

Plain and simple, with this, we can get:

  1. Top Apps using our API. Sometimes we don’t even know who is consuming our APIs, with this we can get it
  2. Requests received in time intervals: one day, the last hour, the last three hours, the last seven days, the last month, and so on
  3. Latency of our API

That is, in our opinion, good enough for our health resources. At least is more than just returning an OK status with a 200 http response code.

Let´s implement it

If we scaffold our API that we represented with the API spec that we’ve shared in previous paragraphs, we will have two flows for our health resources:

The /health is ok to just leave it with that transform which was automatically generated by the spec:

But the /health/details need to be implemented, that is why we created a specific flow for that on a different configuration file. For this article’s purpose the flow is very simple and it lacks some of the normal implementation details such as the error handlers and those types of things, but the purpose of this article is not about that, is about the details of our health resources.

The first thing to understand, and we’ve already mentioned, is that we are using the MuleSoft Analytics API to retrieve the health details, the resource is like this one:

https://anypoint.mulesoft.com/analytics/1.0/$orgID/environments/$envID/query

For this example, we are going to query it like this:

curl --location 'https://anypoint.mulesoft.com/analytics/1.0/9013a01d871/environments/d1a3a5/query' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer 09d19a6b-bb9c--48b781dec4c5' \
--header 'Cookie: XSRF-TOKEN=1cCSBtJr-CdgoCRy2kicgCuH1O8aGXSTmtnI; _csrf=Hw_yIxrZSxN0d8S-r9xU144a' \
--data '{
"type": "enriched-http-event",
"aggregators": [
{
"time_interval": 60,
"aggregators": [
{
"dimension": "status_code"
}
]
}
],
"duration": "1440m",
"filters": [
{
"or": [
{
"equals": {
"api_version_id": 665885
}
}
]
}
],
"start_time": "2024-03-18T19:25:14.589Z",
"include_policy_violation": true
}'

The query is very simple to understand:

  1. time_interval: it is represented in minutes. With that, we will get the information from the last hour. But you can change it at your will
  2. aggregators: You can use different dimensions, in this case, we are using the dimension: status_code
  3. duration: is the duration from which you would like to query for 60 minutes (time_interval). In this case, we are telling it that from the last day, we want 60 minutes of statistics
  4. Filters: we are filtering the query for our specific API, that is why we need the API Identifier
  5. start_time: From when we want to query the 60 minutes of the current day. In this case, is the result of the dataweave function now()
  6. include_policy_violation: if you want statistics of the policy that has been violated you can put that element in true.

Now the only thing we need to do is add an HTTP requester processor for our endpoint, and previously set the query we have just described:

The Set Payload is like this:

output application/json
fun format(d: DateTime) = d as String {format: "yyyy-MM-dd'T'HH:mm:ss.SSS"}
var numberOfHours = 1
---
{
"type": "enriched-http-event",
aggregators: [{
time_interval: 60,
aggregators: [{
dimension: "status_code"
}]
}],
duration: "1440m",
filters: [{
"or": [{
equals: {
api_version_id: p('api.id')
}
}]
}],
start_time: now() - "PT$(numberOfHours)h",
include_policy_violation: true
}

And the HTTP requester:

The resource is protected and it needs to be consumed by presenting a Bearer Token, for that regard, you need to create first a connected app from which you can obtain an access token and use the analytics endpoint. That app should have just the scope for analytics and nothing more. We will elaborate more on that in our next article about this very same topic, since the /health/details resource could be just available for operators or for specific users who have access to the connected app credentials, or as in our example the /health/details resource is open to be consumed for anyone and the MuleSoft flow somehow get it before using it. As I’ve mentioned, we’ll elaborate more on that in our next article.

Now just deploy the application and you can use the health resources, like this:

Plain /health resource:

Improved /health/details resource:

In this case, it says my API in the last 60 minutes has had 2 requests both of them with status 500.

Now after some minutes:

It is returning the same 02 counts for 500 status codes, 03 counts for 200 status codes, and one 429 (rate-limiting).

We will share the code for this in up upcoming days at:

And we will also create a new article explaining things like:

  1. Protecting the /health/details resource. And why could that be necessary?
  2. Applying specific API policies for that /health/details resource. For example: it may have a different rate-limiting than the rest of the resources
  3. Including more statistics for the /health/details resource.
  4. Creating a custom API policy that can be applied to all your APIs and in automatic implement the health resources

Thank you for reading.

--

--