One of the favourite datum that sysadmins and users alike use for accusing server of being overloaded is... 'uptime'
Oh no! load is 100+, server is dying. What do you do next, or how do you find out what's actually causing the load to be 100+ ?
First, we need to understand what 'uptime' is telling us. The man pages would be the first place to look right? man uptime. It says system load
averages for the past 1, 5, and 15 minutes. errr. What does that mean? Ok.. without pointing to authoritative sources, let me state that the
figure actually means the average number of active, running processes. Just take it as that for the time being ;)
Next, let us consider what figures are actually a cause for concern. Now, the load figures are just the average number of active running processes
over a period of time. You could have hundreds of processes that took up slices of CPU, but on average, let's say the load figure is 5.. that means
at any time, 5 processes need attention from the CPU.
Would a load figure of 8 be a high figure? Yes, it might, if you have 1 CPU core, which means on average, 8 processes want to use the 1 CPU. BUT
if you have 8 CPU cores, the server is running fine... usually. Why do I say a load of 8 on a 1 CPU core server MIGHT be high, rather than it
DEFINITELY is high? The catch here is that the processes might not be CPU bound, they might be waiting for disk IO. But let's leave that for later.
Why does the number of active processes sometimes chalk up into the hundreds?
The usual suspects?
1. runaway process (CPU starved, or CPU bound)
2. run out of RAM, and processes getting swapped to disk (RAM starved or RAM bound)
3. heavy disk IO, leading to processes waiting for disk IO to complete (Disk IO starved or Disk IO bound)
4. server is starved for network bandwidth. Network bandwidth bound.
Each of the above is a topic on its own. I'll stop here for this entry. If you want to find out more on your own, check out iostat, dstat, top,
vmstat, free and iftop.
Suffice for now to say that the load figures from 'uptime' or 'top' just displays a symptom.. it shows there's a bottleneck that is delaying
processes from completing its run. It doesn't say the CPU is too slow.. which is what most users think it means. and that's a wrong diagnosis lots
Lim Wee Cheong
30 Dec 2007
[Sysadmin] Access to servers via mobile device and ssh
[Sysadmin] RAID 0 scaling on SCSI U320, Bonnie++ 1.93c benchmark results
[Sysadmin] TODO (Apr 2007)
[Sysadmin] Recover from mistakes in /etc/fstab or e2label usage
[Sysadmin] Server overloaded?
[Sysadmin] Server load high: CPU bound
[Sysadmin] Smokeping: deluxe latency measurement tool
[Sysadmin] Jul 08 to Oct 08 updates
[Sysadmin] Weak link - downtimes caused by the organic being
[Sysadmin] BIOS upgrades - uniflash - hotflash
[Sysadmin] Sizing for Virtual Private Server (VPS) & SSDs
[Sysadmin] iphone, ipod - bluetooth keyboard - Nokia e51
[Sysadmin] e2label, fdisk, /etc/fstab, mount, linux rescue, rescue disk, CentOS
[Sysadmin] opensuse, fix waiting for mandatory device, eth0, eth1, eth2, eth3
[Sysadmin] mount: could not find filesystem '/dev/root'
[Sysadmin] Parallels Virtuozzo Physical server to Container migration (vzp2v)
[Web hosting] DDOS (Distributed Denial of Service)
[Web hosting] Uptime for dedicated server, VPS and shared server
[Web hosting] Shared, Guaranteed and Dedicated Bandwidth
[Web hosting] Unmetered bandwidth
[Web hosting] Free domains?
[Web hosting] Joomla Scalability
[SPAM handling] Tracking applications which are exploited for mass spam mailing
[Buzzwords] Clusters, Clustering
[Security] Destruction of faulty hard disks
[Storage] Benchmark using iometer on linux
[SSD] Benchmark Intel X25-E and Intel X25-M flash SSDs
[SSD] Intel X25-E 64GB G1, 4KB Random IOPS, iometer benchmark
[SSD] Intel X25-M 160GB G2, 4KB Random IOPS, iometer benchmark
[SSD] Comparison of Intel X25-E G1 vs Intel X25-M G2
[cPanel] ClamAV version has reached End of Life! Please upgrade to version 0.95
[cPanel] How to install Java, ImageMagick and ffmpeg
[Perl] Opening text files for reading, and simple regexp (regular expressions)