One of most popular questions during technical interview:

What is load average? How to check if it’s OK?

All roads lead to Rome.

~ medieval statement

What is load average?

This is just representation of system load. Some kind of processes queue exposed by scheduler.

$ cat /proc/loadavg
0.61 0.81 0.84 1/1162 15359

Those values show average number of processes waiting in latest 1/5/15 minutes: 0.61 0.81 0.84.

1/1162 tells us that only 1 of total 1162 processes is executed at the moment.

Finally, 115359 is latest PID allocated by the kernel.

When LA should be considered high or low

It’s a relative value.

I mean, it represents number of processes in scheduler’s queue.

So it could be affected by many factors: network latency, disk I/O delay, etc.

Obviously, for application host it should be as low as possible. But, for service hosts, e.g. performing heavy backup jobs, it’s allowed to be almost infinite, until those jobs are actually running.

NOTE: Always parrallel your tasks.

Rules of thumb are following:

  1. Keeping LA below your virtual CPU cores number ensures you that your host is responsive.

  2. Decreasing sequence of LA output values means that your server is gaining load, while increasing means it becomes less loaded.

NOTE: if you check for values in time (as la15-la5-la1), then 2nd rule is inverted.

Propose multiple ways to check for LA values

Most common answer is something like:

  • Execute cat /proc/loadavg
  • grep from w / uptime utilities
  • Run top / htop or tload
  • getloadavg() from glibc
  • some other language-specific functions like Python os.getloadavg()

All those ideas (but first one!) is a crap. There’s only 1 way to check for LA - reading /proc/loadavg, let me show you:

In fact, there’s only one way to check for load average values.