Creating Custom exporter for Prometheus / Endpoint Health Monitor using prometheus ( custom exporter )
Prometheus
Prometheus is one among the most used metric aggregator in the world of monitoring. They are multiple component participate inorder to give a picture of a application / system behaviour. Base service which responsibe for giving the data model is exporter.
When comes to Promethes , we have many metric exporter in the opensource community example line mysql , cassandra , dns and etc . We going to see what if we have to monitor a system for which exporter not available . We going to take up once such scenario and explore .
Exposing metrics
For exposing metrics , it is important to choose the api service and programming platform . We have multiple option in the market , I am going to explain how those can be achieved using golang.
First why I choose golang
- small size binary when you compile ( multi stage image )
- easy concurrency
What we going to do
I am going to explain a exporter service which going to monitor end point and validate the response
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -
package main
import (
“bufio”
“bytes”
“fmt”
“log”
“net/http”
“os”
“regexp”
“strings”
“time”
)
//var wg sync.WaitGroup
func filter(url string, filter string, c chan string) {
client := http.Client{
Timeout: 5 * time.Second,
}
response, err := client.Get(url)
if err != nil {
defer func() {
if err := recover(); err != nil {
log.Println(“panic occurred:”, err)
c <- “app_health{instance=\”” + url + “\”}” + “ 0”
}
}()
response.Body.Close()
} else {
buf := new(bytes.Buffer)
buf.ReadFrom(response.Body)
newStr := strings.ToLower(buf.String())
re := regexp.MustCompile(filter)
list := re.FindAllString(newStr, -1)
if len(list) > 0 {
c <- “app_health{instance=\”” + url + “\”}” + “ 1”
} else {
c <- “app_health{instance=\”” + url + “\”}” + “ 0”
}
response.Body.Close()
}
}
func handleRequests() {
http.HandleFunc(“/metrics”, homePage)
log.Fatal(http.ListenAndServe(“:9101”, nil))
}
func homePage(w http.ResponseWriter, r *http.Request) {
c := make(chan string)
file, err := os.Open(“./url.txt”)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
go filter(scanner.Text(), `head><meta content`, c)
fmt.Fprintf(w, <-c+”\n”)
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}
func main() {
handleRequests()
}
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
In the above script we are validating urls listed on the url.txt file (concurrent using go routine) with the content match ‘head><meta content’
go filter(scanner.Text(), `head><meta content`, c)
Once the content match we will get the validation as below
In the script I am using go-routine to handle , all the url in concurrent
file, err := os.Open(“./url.txt”)
if err != nil {
log.Fatal(err)
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
go filter(scanner.Text(),`^{“status”:(\s+)?”up”`,c)
fmt.Fprintf(w,<-c)
}
I have used bufio module to read the content in bytes and valid results will be passed as channels ‘<-c’ . Please add time wait if you are experiencing dead lock in the script
I have made this setup suitable for kubernetes environment with all relavent manifest Please refer the below chart
GitHub - praveensams16/prometheus-healthcheck
Image preparation docker build -t validator .
github.com
I have made the configmap to fetch the url from the values file using a loop in the manifest
Prometheus scraper
Once you deploy the exporter service , add the scraper endpoint in the prometheus configuration file
- job_name: ‘health_exporter’
scrape_interval: 5m
scrape_timeout: 120s
static_configs:
— targets:
— ‘app-exporter.namespace:9101’
AlertManager
When you want to have alert for the url down , we can use alermanger rule as below
- name: Production URL Down
rules:
— alert: UrlDown
expr: App_health{job=”health_exporter”} != 1
for: 5m
labels:
severity: P3
team: io
annotations:
description: ‘Prod URL {{ $labels.exported_instance }} down for 5 minutes’
summary: ‘Prod url {{ $labels.exported_instance }} down for 5 minutes’