Grafana Dashboards for Windows Servers

Understanding two key server performance metrics

Vytas Prazarkevicius
Geek Culture
4 min readApr 17, 2021

--

Photo by Ibrahim Boran on Unsplash

Prometheus and Grafana are becoming de-facto standards for monitoring linux servers. You will find countless guides of how to get Prometheus with Grafana running under 10 minutes. This includes beautiful, interactive dashboards out-of-the-box.

But what about monitoring Windows servers? The how-to guide space for setting up Windows with Prometheus and Grafana is much more empty. The good news is that you can still monitor your windows servers with Prometheus and Grafana. The bad news is that the variety of available metrics and Grafana dashboards out of the box is quite limited.

But don’t despair. In this how-to guide, I will introduce you to the most popular Grafana dashboard for Windows servers. You will get an overview of how to read the dashboard data about CPU and memory.

Dashboards

Typical post about Grafana focuses on setting up data sources, importing dashboards and tweaking “look and feel” of the panes. Instead of introducing you to the technical setup part, I will describe how to benefit from Grafana dashboards. At the end, understanding what a dashboard tells you about the performance is the most valuable part.

Grafana dashboard

The most popular Grafana dashboard for Windows servers is called “Windows Node” and can be downloaded from Grafana website here. This dashboard is compatible with WMI exporter 0.12 which you can download here and install by following instructions on the Github page.

“Windows Node” Grafana dashboard contains many panels with various performance aspects of your windows server. CPU and memory are by far the most import server resources. In order to undestand those, we need to dig deeper into how CPU and memory is managed in Windows.

CPU

The CPU panel in the Grafana dashboard looks like that:

CPU Load panel

This panel shows what kind of work CPU has been doing for a period of time.

On the left side you see the Y axis with a top value of 4. In this particular example the monitored server has a single processor with 4 cores (e.g. 4 logical processors) and hence the max value of Y axis is 4.

On the right side of the panel we see the legend representing 5 different activities of the processor: dcp, idle, interrupt, privileged, user.

Let’s take a look at the legend closer:

CPU Load panel legend

The other name for these activities is “processor modes”. When we say “processor is in the mode X” we mean that the processor is doing work of type X.

Let’s jump into definitions of those modes.

User mode represents the work done for applications and different subsystems. The typical applications we run on servers are executed in user mode. If your application is computation-hungry, you will see user mode metric spiking up.

Privileged mode is a work the processor does for a Windows operating system. When a Windows system service is called, the service will often run in privileged mode to gain the access to the system-private data. Such data is protected from the access of threads executing in user mode.

Interrupt mode is a type of work done for Windows system to handle hardware requests. High level of interrupts may indicate a hardware or driver problem.

Memory

Memory panel legend

The memory pane in the dashboard displays information about three key metrics on server memory:

  • virtual memory
  • physical memory
  • free physical memory

Physical memory is the memory installed on a server (RAM). Processes pf applications do not access physical memory directly — only operating system does. Instead those processes use virtual memory which is created by the operating system.

Why do we need virtual memory? Processes sometimes require more memory than there is physical memory available. In such cases the operating system loads the processes sequentially to the physical memory. It waits for the end of each process and then it loads the next one. This switching repeats for all processes which are using virtual memory.

To do so, the operating system uses a technique called Paging. When the server does not have enough of physical memory for all processes, parts of information from physical memory is saved to a disk file called Page File.

Summing up

You should now have an understanding of how to read CPU and memory consumption charts. These metrics cover 80% of your needs when assessing performance bottlenecks of your servers (great illustration of Pareto principle).

If you need to dive-deep into observing how your application impacts CPU and memory, take a look at Prometheus client libraries. You can add those libraries directly to your application to collect application run-time metrics. Run-time metrics are usually helpful to answer “why” questions of your investigations of performance bottlenecks.

--

--