One of most popular questions during technical interview:
What is load average? How to check if it’s OK?
All roads lead to Rome.
~ medieval statement
What is load average?
This is just representation of system load. Some kind of processes queue exposed by scheduler.
$ cat /proc/loadavg
0.61 0.81 0.84 1/1162 15359
Those values show average number of processes waiting in latest 1/5/15 minutes: 0.61 0.81 0.84
.
1/1162
tells us that only 1 of total 1162 processes is executed at the moment.
Finally, 115359
is latest PID allocated by the kernel.
When LA should be considered high or low
It’s a relative value.
I mean, it represents number of processes in scheduler’s queue.
So it could be affected by many factors: network latency, disk I/O delay, etc.
Obviously, for application host it should be as low as possible. But, for service hosts, e.g. performing heavy backup jobs, it’s allowed to be almost infinite, until those jobs are actually running.
NOTE: Always parrallel your tasks.
Rules of thumb are following:
Keeping LA below your virtual CPU cores number ensures you that your host is responsive.
Decreasing sequence of LA output values means that your server is gaining load, while increasing means it becomes less loaded.
NOTE: if you check for values in time (as
la15-la5-la1
), then 2nd rule is inverted.
Propose multiple ways to check for LA values
Most common answer is something like:
- Execute
cat /proc/loadavg
grep
fromw
/uptime
utilities- Run
top
/htop
ortload
getloadavg()
from glibc- some other language-specific functions like Python
os.getloadavg()
All those ideas (but first one!) is a crap. There’s only 1 way to check for LA - reading /proc/loadavg
, let me show you:
- All of
w, uptime, tload, top
use psproc lib which read it from this file - Same for htop and family
- Even glibc uses same file
- And, finally, Python calls for glibc’s getloadavg(), which reads same file again
In fact, there’s only one way to check for load average values.