Monitoring Teamcity server with Grafana and Prometheus

Gerasimos Alexiou
tech.thesignalgroup
10 min readJun 6, 2023

by Gerasimos Alexiou, Senior DevOps Engineer

At Signal, we use TeamCity server for our CI/CD strategy. Our installation consists of the main server and multiple agents that build and deploy our applications on the infrastructure components that we use such as Kubernetes, Service Fabric, and cloud apps.

As our build and deploy plan relies heavily on Teamcity we started exploring ways to monitor statistics about the performance and behavior of the automation processes. Our first experiment is focused on the teamcity server and agents monitoring through the build-in metrics.

TeamCity provides a variety of diagnostic tools and indicators to monitor and troubleshoot the server. These tools make it easier to identify and investigate problems and, if needed, report issues for the server.

Apart from the integrated metrics that you can find under Administration/Server Administration/Diagnostics you can use the metrics endpoint which is Prometheus compatible in order to get statistics for current and queued builds, projects, active users etc.

You can find the metrics URL under the endpoint /app/metrics. You can also include the experimental feature to get even more metrics from the endpoint by adding the experimental=true flag.

You will end up having a URL like the below:

https://URL/app/metrics?experimental=true

Dashboard
Dashboard is composed of some gauges and graphs. Gauges give a quick view of building behavior by providing a sum of the active agents, projects number, running and queued builds, successful and interrupted builds. Additionally graphs provide an overview of start and finish rate for the builds, real time monitoring of the current builds and also some performance statistics about Teamcity process.

All those metrics combined can provide a holistic overview on the building behavior of teamcity server and help administrators quickly resolve queued bottleneck issues and also performance degradation failures that could occur with the infrastructure.

Findings:

Build & agents graph can help one quickly identify patterns like which hours of the day there are many queued builds and also how many builds are running on specific time frames. Based on that we can observe hours of the week when development is not so active or the opposite.

Build start/finish rate can help one identity if more agents are required in the infrastructure given that the build start rate exceeds the finish rate.

Lastly the CPU/Memory graphs about Teamcity process can help us observe issues with the installation, performance degradation or leaks.

Gauges:
1) Running, Queued builds
2) Active users
3) Projects number
4) Sum of agents
5) Total, Successful, Canceled builds
6) Uptime

Graphs:
1) Running, Queued builds and agents
2) Builds start/finish rate
3) Server CPU, Memory Usage

The JSON model for the dashboard can be found below. In order to use it you should replace PROMETHEUS-ID with your own prometheus datasource in the Grafana connections. The last step would be to import the json under a Grafana dashboard and you are ready to go.

Follow our Linkedin page in order to stay in touch with our latest developments.


{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": 510,
"links": [],
"liveNow": false,
"panels": [
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 6,
"panels": [],
"title": "Builds",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 2,
"x": 0,
"y": 1
},
"id": 8,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "8.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "builds_running_number",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Running builds",
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 2,
"x": 2,
"y": 1
},
"id": 10,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "8.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "builds_queued_number",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Queued Builds",
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 2,
"x": 4,
"y": 1
},
"id": 14,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "8.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "users_active_number",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Active users",
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 2,
"x": 6,
"y": 1
},
"id": 30,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "8.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "projects_number",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Projects number",
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 2,
"x": 8,
"y": 1
},
"id": 12,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "8.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "agents_connected_authorized_number",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Sum agents",
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 2,
"x": 10,
"y": 1
},
"id": 26,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "8.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "builds_started_number",
"interval": "",
"legendFormat": "Builds",
"refId": "A"
}
],
"title": "Total builds",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 3,
"x": 12,
"y": 1
},
"id": 34,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "8.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "builds_finished_number{interrupted=\"false\",nodeId=\"MAIN_SERVER\",}",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Successful builds",
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 2,
"x": 15,
"y": 1
},
"id": 32,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "8.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "builds_finished_number{interrupted=\"true\",nodeId=\"MAIN_SERVER\",}",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Canceled builds",
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 2,
"x": 17,
"y": 1
},
"id": 36,
"options": {
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "8.4.3",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "server_uptime_milliseconds/86400000",
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Uptime (days)",
"type": "gauge"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 12,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 5
},
"id": 16,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "builds_running_number",
"interval": "",
"legendFormat": "Number of running builds",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "builds_queued_number",
"hide": false,
"interval": "",
"legendFormat": "Number of queued builds",
"refId": "B"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "agents_connected_authorized_number",
"hide": false,
"interval": "",
"legendFormat": "Agents number",
"refId": "C"
}
],
"title": "Builds & agents",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 5
},
"id": 20,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "max(rate(builds_started_number[$__rate_interval])*60)",
"interval": "",
"legendFormat": "Build start rate",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "max(rate(builds_finished_number[$__rate_interval])*60)",
"hide": false,
"interval": "",
"legendFormat": "Build finish rate",
"refId": "B"
}
],
"title": "Build start/finish rate",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 22,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 15
},
"id": 39,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "sum(avg_over_time(cpu_usage_process_number[$__interval])) by (nodeId)*100",
"interval": "",
"intervalFactor": 1,
"legendFormat": "CPU usage",
"refId": "A"
}
],
"title": "Process CPU usage",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "continuous-RdYlGr"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 22,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 2,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "decgbytes"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 15
},
"id": 40,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "PROMETHEUS-ID"
},
"exemplar": true,
"expr": "sum(avg_over_time(jvm_memory_used_bytes{area=\"heap\"}[$__interval])) by (nodeId) / 1000000000",
"interval": "",
"intervalFactor": 1,
"legendFormat": "Memory usage",
"refId": "A"
}
],
"title": "Process Memory usage",
"type": "timeseries"
}
],
"refresh": false,
"schemaVersion": 35,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Dashboard",
"uid": "DASHBOARD-ID",
"version": 28,
"weekStart": ""
}

--

--

Gerasimos Alexiou
tech.thesignalgroup

Computer Engineer(Ms) & DevOps Engineer | Follow me for more articles | Blog: https://blog.geralexgr.com | Youtube: youtube.com/@geralexgr