Symptoms of CPU, disk, memory, and networking bottlenecks
For each symptom described below, some suggestions of commands to check for each symptom are given. More information on the commands and their output may be found in their respective man pages, and the Glance help screens are very useful for describing the meaning of particular fields displayed in Glance. It is important to note that no one tool can be relied on totally. It takes a combination of tools to gain an accurate picture of system performance. And again, the only truly accurate measure that can be used is: is this value good or bad as compared to when the system runs fine.
CPU Bottlenecks
Zero percent idle CPU time (sar -u, top).
With zero percent idle, high percent user CPU time compared to
system CPU time (sar -u, top).
Large run queue size sustained over time (vmstat 5 30, uptime).
Many processes blocked on priority (Glance).
Slow response time.
High percent system CPU time (sar -u, top).
Disk Bottlenecks
High disk utilization (sar -d, iostat, glance).
Large disk queue length (sar -d, iostat, glance).
High %wio (sar -u).
Low buffer cache hit rates (sar -b).
Large run queue with idle CPU (vmstat 5 30).
Memory Bottlenecks
High sustained page _out_ rates (paging in is normal) (vmstat 5 30).
Small number of free and active virtual memory pages (vmstat 5 30).
High disk activity on swap devices (sar -d, glance, iostat).
Out of memory errors.
CPU time given to vhand (and swapper at 9.X) (ps -ef, top, glance).
Excessive CPU time given to system versus user processes.
Flow chart for diagnosing CPU, memory, or disk bottlenecks
Start with Step 1, which begins by looking at CPU usage, and follow the instructions from there towards diagnosing CPU, memory, or disk bottlenecks. For each step below, look for trends over a sustained period of time from data gathered while the system is performing poorly. Again, accurate diagnosis can only be made when this data is compared to data collected when the system is performing well.
Step 1
# sar -u [interval] [iterations]
(example: sar -u 5 30)
Is the %idle low? This is the percentage of time that the CPU is
not running processes. Zero %idle over a sustained period could be
the first indication of a CPU bottleneck.
No -> The system is not CPU bound. Go to Step 3.
Yes -> The system is possibly either CPU, memory, or I/O
bound. Go to Step 2.
Step 2
Is %usr high? Many systems normally operate with 80% of the CPU
time spent as user time, and 20% spent as system time. Other
systems normally use more or less than 80% user time.
No -> The system may be experiencing either a CPU, memory, or
I/O bottleneck. Go to Step 3.
Yes -> The system is likely experiencing a CPU bottleneck due to
user processes. Go to Section 3, Part A, Tuning a system
with a CPU bottleneck.
Step 3
Does %wio have a value > 15?
Yes -> Keep this in mind later. It is an indication of possible
disk or tape involvement in a bottleneck. Go to Step 4.
No -> Go to Step 4.
Step 4
# sar -d [interval] [iterations]
Is %busy for any of the disks >50? (Remember, 50% is a rough
guide, a better question would be is it much higher than normal for
your system. On some systems even a %busy value of 20 may indicate
a disk bottleneck, while others may normally have disks that
are 50% busy.) For this same disk, is avwait > avserv?
No -> Most likely no disk bottlenecks, go to Step 6.
Yes -> There seems to be an IO bottleneck on this device.
Go to Step 5.
Step 5
There is a disk bottleneck on the system. What is on that
bottlenecked disk?
Raw Partitions,
File Systems -> Go to Section 3, Part B, Tuning a system with
a Disk I/O bound system.
Swap -> Possibly caused by a memory bottleneck.
Goto Step 6.
Step 6
# vmstat [interval] [iterations]
Over sustained periods of time, is po > 0?
Is (free * 4k) < 2 MB, for a s800 system
(free * 4k < 1 MB for s700)? (The values 2 MB and 1 MB are rough
guides, the actual value of LOTSFREE, the point at which a system
begins paging, is calculated at system boot time and is based on
the size of system memory.)
No -> If, in step 1, %idle was low, the system is most likely
CPU bound. Go to Section 3, Part A, Tuning a CPU bound
system. If %idle was not low, there does not appear to
be either a CPU, a disk i/o, or a memory bottleneck.
Go to Section 4, Other Bottlenecks.
Yes -> There is a memory bottleneck on the system. Go to
Section 3, Part C, Tuning a Memory bound system.
 
 

0 comments:
Post a Comment