SYSTEM PERFORMANCE -1
[Memory –vmstat / sar/ prstat]
First I want to thanks all bloggers and oracle documentation for great resources provided by them, I want to say thanks individually but really I forget, that from where I had learned all these. At the time when I was learning and preparing notes I got help from all over the blogs/books/YouTube, and now I really don’t remember that which part i referenced from where. All I have is only my Notes, again thanks to all who helped me in any manner to have some knowledge and confidence i gained because of them and my sincere apologies also.
**Performance monitoring is one of the most crucial things in System Admin’s Life and I am pretty sure that this thing will learned only when we start sweating from each and every side of ours; means facing a Lion in Zoo and facing a Lion in Jungle… imagine the difference
“IT CANNOT BE LEARNED, IT SHOULD BE EARNED”
TSHOOT is ART rather than technology…
SO, for performance what we need to check?
· Memory
· Disk
· Network
· CPU
· File system
· Some bug which causes Memory Leak
· NFS cache
MEMORY
For memory we have vmstat [system-wide summary of key activity]
Whenever a system has sufficient Physical memory, it won’t have any problem.
When Ph. mem is low then kernel starts using swap memory and we know that swap has 2 processes i.e. Paging and Swapping
Let’s start
root@sol-test-1:>/# vmstat i 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr -- -- -- -- in sy cs us sy id
0 0 0 1314100 357412 8 18 14 0 1 0 20 0 0 0 0 502 179 246 0 1 99
0 0 0 1306284 347812 0 8 0 0 0 0 0 0 0 0 0 501 121 194 0 1 99
0 0 0 1306196 347724 0 2 0 0 0 0 0 0 0 0 0 505 120 195 0 1 99
^C
root@sol-test-1:>/# vmstat i 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr -- -- -- -- in sy cs us sy id
0 0 0 1314092 357400 8 18 14 0 1 0 20 0 0 0 0 502 179 246 0 1 99
0 0 0 1306284 347812 0 8 0 0 0 0 0 0 0 0 0 489 120 195 0 0 100
0 0 0 1306196 347724 0 2 0 0 0 0 0 0 0 0 0 506 121 193 0 2 98
^C
root@sol-test-1:>/# vmstat i 2
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr -- -- -- -- in sy cs us sy id
0 0 0 1314084 357388 8 18 14 0 1 0 20 0 0 0 0 502 179 246 0 1 99
0 0 0 1306284 347812 0 21 0 0 0 0 0 0 0 0 0 474 229 195 0 2 98
0 0 0 1306196 347724 0 4 0 0 0 0 0 0 0 0 0 459 233 193 0 1 99
“i” is interval of 5 sec in first two and it is 2sec in third one
When u fire this command u might notice that the first o/p comes without any delay or it do not waits for any second to complete,
Why is that?
I learned somewhere that it's average report from the time of last reboot.
Or some said to ignore the first line, so decide yourself
Well at least the first line gives us the good idea about system health
Let’s start with the first field
Kthr kernel thread
r Number of processes in the run queue
b Number of processes blocked for resource I/O or paging
w Number of processes currently swapped
memory
swap Current free swap (KB).
free Current free memory (KB).
To know the exact value of swap / free…
1314084 (swap) already in kb…
1314084/1024
1283mb /1024 = 1gb
357388 (free) already in kb…
357388/1024
349mb
page Report information about page faults and paging activity
re Page reclaims /sec.
mf Minor faults /sec.
pi KB paged in /sec.
po KB paged out /sec.
fr KB freed /sec.
de Anticipated short-term memory shortfall (KB) /sec.
sr Pages scanned by Page Scanner/sec. by LRU [last recently used] algorithm
disk
s0 Number of disk operations /sec on SCSI disk target 0 (there can be up to four columns of information, depending on how many disks installed on system)
faults Report the trap/interrupt rates /sec.
in Device interrupts /sec
sy System calls /sec
cs CPU context switches /sec
cpu percentage usage of CPU [this is average of all processors on system]
us % CPU time in user mode.
sy % CPU ti me in system mode.
id % CPU time idle.
SO… really what to look?
We are analyzing Memory… right?
“sr” from page column and “w” from kthr column
OK… what does it mean?
“sr” is scan rate …. So when it is high then we need to alarmed but we
have to continuously monitor this for some time, it might be possible that
some processes are reading some uncached data, so we need to wait for let
it down after some time, if it remain high then we should start our Hunt.
“w” is no of processes / threads transferred to swap. If it is 20 and
still counting after each interval, then we are in trouble.
OK… Now what?
Where to find the culprit?
root@sol-test-1:>/#ps -ef -o pid,rss,args|sort -n +1
root@sol-test-1:>/#prstat -t -s size -c 1 2
[Sort on the basis of Virtual memory size
{virtual memory=available memory+swap memory}]
[-t provides complete users resource utilization]
[-s Sort output lines by key in ascending order]
[ Size / rss / time /cpu are keys]
[-c do not overwrite, just print the new one following old]
[1 is interval of 1sec]
[2 is count, means repeat this 2 times]
root@sol-test-1:>/#prstat -t -s rss -c 1 2
[Sort on the basis of Physical mem size {rss=resident set size}]
root@sol-test-1:>/#prstat -s size –Z
[-Z is to see the performance status of zones]
root@sol-test-1:>/#prstat -t -s size -c 1 2
[This could be fired under the zone if some zone found culprit]
root@sol-test-1:>/#sar -r 1 5
SunOS sol-test-1 5.10 Generic_147441-01 i86pc 10/09/2014
01:58:35 freemem freeswap
01:58:36 86545 2610712
01:58:37 86545 2610712
01:58:38 86545 2610712
01:58:39 86545 2610712
01:58:40 86544 2610696
Average 86545 2610709
Freemem average pages available to user processes
[Here it talks in term of page means we have to find “pagesize”]
Freeswap disk blocks available for page swapping [Here it is talking about block size]
Calculate Freemem [86545]
root@sol-test-1:>/#pagesize
4096
[in sparc it is 8192]
86545*4096/1024/1024
338mb
Calculate Freeswap [2610712]
2610712*512/1024/1024
1274
1274/1024
1
root@sol-test-1:>/#vmstat i 2 2
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr -- -- -- -- in sy cs us sy id
0 0 0 1310076 352304 4 10 6 0 0 0 10 0 0 0 0 497 117 219 0 1 99
0 0 0 1305832 346672 0 21 0 0 0 0 0 0 0 0 0 513 261 201 0 0 100
root@sol-test-1:>/#bc
1305832/1024/1024
1gb
352304/1024
344mb
Well… these are some tools which can give us idea about memory status…
Though TS is about to deal with real situations… These tools will at least
provide clear picture about mem usage and problems.
Cont……
No comments:
Post a Comment