How to measure cpu load v cpu percentage and examples

It depends on load the server have, and what load you’d expect.

But start with something in the ranges of

1 min load avg:
w: <ncpu> * 8
c: <ncpu> * 10
5 min load avg:
w: <ncpu> * 5
c: <ncpu> * 8
15 min load avg:
w: <ncpu> * 2
c: <ncpu> * 3

and adjust for each server, so you get notifications when it makes sense for the particular server.

For example, a server with 4 CPU cores, it would read as follows: check_load -w 32,20,8 -c 40,32,12

It depends on load the server have, and what load you’d expect.

But start with something in the ranges of

1 min load avg:
w: <ncpu> * 8
c: <ncpu> * 10
5 min load avg:
w: <ncpu> * 5
c: <ncpu> * 8
15 min load avg:
w: <ncpu> * 2
c: <ncpu> * 3

and adjust for each server, so you get notifications when it makes sense for the particular server.

For example, a server with 4 CPU cores, it would read as follows: check_load -w 32,20,8 -c 40,32,12

Though its an old post, replying now because I knew check_load threshold values are bigtime headache for the newbies.. ;)

A warning alert, if CPU is 70% for 5min, 60% for 10mins, 50% for 15mins. A critical alert, if CPU is 90% for 5min, 80% for 10mins, 70% for 15mins.

*command[check_load]=/usr/local/nagios/libexec/check_load -w 0.7,0.6,0.5 -c 0.9,0.8,0.7*

All my findings about CPU load:

Whats meant by “the load”: Wikipedia says:

All Unix and Unix-like systems generate a metric of three “load average” numbers in the kernel. Users can easily query the current result from a Unix shell by running the uptime command:

$ uptime
14:34:03 up 10:43, 4 users, load average: 0.06, 0.11, 0.09

From the above output load average: 0.06, 0.11, 0.09 means (on a single-CPU system):

  • during the last minute, the CPU was underloaded by 6%
  • during the last 5 minutes, the CPU was underloaded 11%
  • during the last 15 minutes, the CPU was underloaded 9%

.

$ uptime
14:34:03 up 10:43, 4 users, load average: 1.73, 0.50, 7.98

The above load average of 1.73 0.50 7.98 on a single-CPU system as:

  • during the last minute, the CPU was overloaded by 73% (1 CPU with 1.73 runnable processes, so that 0.73 processes had to wait for a turn)
  • during the last 5 minutes, the CPU was underloaded 50% (no processes had to wait for a turn)
  • during the last 15 minutes, the CPU was overloaded 698% (1 CPU with 7.98 runnable processes, so that 6.98 processes had to wait for a turn)

Nagios threshold value calculation:

For Nagios CPU Load setup, which includes warning and critical:

y = c * p / 100

Where: y = nagios value c = number of cores p = wanted load procent

for a 4 core system:

time      5 min  10 min    15 min
warning: 90% 70% 50%
critical: 100% 80% 60%
command[check_load]=/usr/local/nagios/libexec/check_load -w 3.6,2.8,2.0 -c 4.0,3.2,2.4

For a single core system:

y = p / 100

Where: y = nagios value p = wanted load procent

time       5 min  10 min    15 min
warning: 70% 60% 50%
critical: 90% 80% 70%
command[check_load]=/usr/local/nagios/libexec/check_load -w 0.7,0.6,0.5 -c 0.9,0.8,0.7

A great white paper about CPU Load analysis by Dr. Gunther http://www.teamquest.com/pdfs/whitepaper/ldavg1.pdf In this online article Dr. Gunther digs down into the UNIX kernel to find out how load averages (the “LA Triplets”) are calculated and how appropriate they are as capacity planning metrics.

Though its an old post, replying now because I knew check_load threshold values are bigtime headache for the newbies.. ;)

A warning alert, if CPU is 70% for 5min, 60% for 10mins, 50% for 15mins. A critical alert, if CPU is 90% for 5min, 80% for 10mins, 70% for 15mins.

*command[check_load]=/usr/local/nagios/libexec/check_load -w 0.7,0.6,0.5 -c 0.9,0.8,0.7*

All my findings about CPU load:

Whats meant by “the load”: Wikipedia says:

All Unix and Unix-like systems generate a metric of three “load average” numbers in the kernel. Users can easily query the current result from a Unix shell by running the uptime command:

$ uptime
14:34:03 up 10:43, 4 users, load average: 0.06, 0.11, 0.09

From the above output load average: 0.06, 0.11, 0.09 means (on a single-CPU system):

  • during the last minute, the CPU was underloaded by 6%
  • during the last 5 minutes, the CPU was underloaded 11%
  • during the last 15 minutes, the CPU was underloaded 9%

.

$ uptime
14:34:03 up 10:43, 4 users, load average: 1.73, 0.50, 7.98

The above load average of 1.73 0.50 7.98 on a single-CPU system as:

  • during the last minute, the CPU was overloaded by 73% (1 CPU with 1.73 runnable processes, so that 0.73 processes had to wait for a turn)
  • during the last 5 minutes, the CPU was underloaded 50% (no processes had to wait for a turn)
  • during the last 15 minutes, the CPU was overloaded 698% (1 CPU with 7.98 runnable processes, so that 6.98 processes had to wait for a turn)

Nagios threshold value calculation:

For Nagios CPU Load setup, which includes warning and critical:

y = c * p / 100

Where: y = nagios value c = number of cores p = wanted load procent

for a 4 core system:

time      5 min  10 min    15 min
warning: 90% 70% 50%
critical: 100% 80% 60%
command[check_load]=/usr/local/nagios/libexec/check_load -w 3.6,2.8,2.0 -c 4.0,3.2,2.4

For a single core system:

y = p / 100

Where: y = nagios value p = wanted load procent

time       5 min  10 min    15 min
warning: 70% 60% 50%
critical: 90% 80% 70%
command[check_load]=/usr/local/nagios/libexec/check_load -w 0.7,0.6,0.5 -c 0.9,0.8,0.7

A great white paper about CPU Load analysis by Dr. Gunther http://www.teamquest.com/pdfs/whitepaper/ldavg1.pdf In this online article Dr. Gunther digs down into the UNIX kernel to find out how load averages (the “LA Triplets”) are calculated and how appropriate they are as capacity planning metrics.

Show your support

Clapping shows how much you appreciated Venkata Chitturi’s story.