THE ADMIN's LAB : Solaris System Performance Monitoring -1 [Memory

SYSTEM PERFORMANCE -1

[Memory –vmstat / sar/ prstat]

First I want to thanks all bloggers and oracle documentation for great resources provided by them, I want to say thanks individually but really I forget, that from where I had learned all these. At the time when I was learning and preparing notes I got help from all over the blogs/books/YouTube, and now I really don’t remember that which part i referenced from where. All I have is only my Notes, again thanks to all who helped me in any manner to have some knowledge and confidence i gained because of them and my sincere apologies also.

**Performance monitoring is one of the most crucial things in System Admin’s Life and I am pretty sure that this thing will learned only when we start sweating from each and every side of ours; means facing a Lion in Zoo and facing a Lion in Jungle… imagine the difference

“IT CANNOT BE LEARNED, IT SHOULD BE EARNED”

TSHOOT is ART rather than technology…

SO, for performance what we need to check?

· Memory

· Disk

· Network

· CPU

· File system

· Some bug which causes Memory Leak

· NFS cache

MEMORY

For memory we have vmstat [system-wide summary of key activity]

Whenever a system has sufficient Physical memory, it won’t have any problem.

When Ph. mem is low then kernel starts using swap memory and we know that swap has 2 processes i.e. Paging and Swapping

Let’s start

root@sol-test-1:>/# vmstat i 5

kthr memory page disk faults cpu

r b w swap free re mf pi po fr de sr -- -- -- -- in sy cs us sy id

0 0 0 1314100 357412 8 18 14 0 1 0 20 0 0 0 0 502 179 246 0 1 99

0 0 0 1306284 347812 0 8 0 0 0 0 0 0 0 0 0 501 121 194 0 1 99

0 0 0 1306196 347724 0 2 0 0 0 0 0 0 0 0 0 505 120 195 0 1 99

root@sol-test-1:>/# vmstat i 5

kthr memory page disk faults cpu

r b w swap free re mf pi po fr de sr -- -- -- -- in sy cs us sy id

0 0 0 1314092 357400 8 18 14 0 1 0 20 0 0 0 0 502 179 246 0 1 99

0 0 0 1306284 347812 0 8 0 0 0 0 0 0 0 0 0 489 120 195 0 0 100

0 0 0 1306196 347724 0 2 0 0 0 0 0 0 0 0 0 506 121 193 0 2 98

root@sol-test-1:>/# vmstat i 2

kthr memory page disk faults cpu

r b w swap free re mf pi po fr de sr -- -- -- -- in sy cs us sy id

0 0 0 1314084 357388 8 18 14 0 1 0 20 0 0 0 0 502 179 246 0 1 99

0 0 0 1306284 347812 0 21 0 0 0 0 0 0 0 0 0 474 229 195 0 2 98

0 0 0 1306196 347724 0 4 0 0 0 0 0 0 0 0 0 459 233 193 0 1 99

“i” is interval of 5 sec in first two and it is 2sec in third one

When u fire this command u might notice that the first o/p comes without any delay or it do not waits for any second to complete,

Why is that?

I learned somewhere that it's average report from the time of last reboot.

Or some said to ignore the first line, so decide yourself

Well at least the first line gives us the good idea about system health

Let’s start with the first field

Kthr kernel thread

r Number of processes in the run queue

b Number of processes blocked for resource I/O or paging

w Number of processes currently swapped

memory

swap Current free swap (KB).

free Current free memory (KB).

To know the exact value of swap / free…

1314084 (swap) already in kb…

1314084/1024

1283mb /1024 = 1gb

357388 (free) already in kb…

357388/1024

349mb

page Report information about page faults and paging activity

re Page reclaims /sec.

mf Minor faults /sec.

pi KB paged in /sec.

po KB paged out /sec.

fr KB freed /sec.

de Anticipated short-term memory shortfall (KB) /sec.

sr Pages scanned by Page Scanner/sec. by LRU [last recently used] algorithm

disk

s0 Number of disk operations /sec on SCSI disk target 0 (there can be up to four columns of information, depending on how many disks installed on system)

faults Report the trap/interrupt rates /sec.

in Device interrupts /sec

sy System calls /sec

cs CPU context switches /sec

cpu percentage usage of CPU [this is average of all processors on system]

us % CPU time in user mode.

sy % CPU ti me in system mode.

id % CPU time idle.

SO… really what to look?

We are analyzing Memory… right?

“sr” from page column and “w” from kthr column

OK… what does it mean?

“sr” is scan rate …. So when it is high then we need to alarmed but we

have to continuously monitor this for some time, it might be possible that

some processes are reading some uncached data, so we need to wait for let

it down after some time, if it remain high then we should start our Hunt.

“w” is no of processes / threads transferred to swap. If it is 20 and

still counting after each interval, then we are in trouble.

OK… Now what?

Where to find the culprit?

root@sol-test-1:>/#ps -ef -o pid,rss,args|sort -n +1

root@sol-test-1:>/#prstat -t -s size -c 1 2

[Sort on the basis of Virtual memory size

{virtual memory=available memory+swap memory}]

[-t provides complete users resource utilization]

[-s Sort output lines by key in ascending order]

[ Size / rss / time /cpu are keys]

[-c do not overwrite, just print the new one following old]

[1 is interval of 1sec]

[2 is count, means repeat this 2 times]

root@sol-test-1:>/#prstat -t -s rss -c 1 2

[Sort on the basis of Physical mem size {rss=resident set size}]

root@sol-test-1:>/#prstat -s size –Z

[-Z is to see the performance status of zones]

root@sol-test-1:>/#prstat -t -s size -c 1 2

[This could be fired under the zone if some zone found culprit]

root@sol-test-1:>/#sar -r 1 5

SunOS sol-test-1 5.10 Generic_147441-01 i86pc 10/09/2014

01:58:35 freemem freeswap

01:58:36 86545 2610712

01:58:37 86545 2610712

01:58:38 86545 2610712

01:58:39 86545 2610712

01:58:40 86544 2610696

Average 86545 2610709

Freemem average pages available to user processes

[Here it talks in term of page means we have to find “pagesize”]

Freeswap disk blocks available for page swapping [Here it is talking about block size]

Calculate Freemem [86545]

root@sol-test-1:>/#pagesize

4096

[in sparc it is 8192]

86545*4096/1024/1024

338mb

Calculate Freeswap [2610712]

2610712*512/1024/1024

1274

1274/1024

root@sol-test-1:>/#vmstat i 2 2

kthr memory page disk faults cpu

r b w swap free re mf pi po fr de sr -- -- -- -- in sy cs us sy id

0 0 0 1310076 352304 4 10 6 0 0 0 10 0 0 0 0 497 117 219 0 1 99

0 0 0 1305832 346672 0 21 0 0 0 0 0 0 0 0 0 513 261 201 0 0 100

root@sol-test-1:>/#bc

1305832/1024/1024

1gb

352304/1024

344mb

Well… these are some tools which can give us idea about memory status…

Though TS is about to deal with real situations… These tools will at least

provide clear picture about mem usage and problems.

Cont……

THE ADMIN's LAB

Thursday, 9 October 2014

Solaris System Performance Monitoring -1 [Memory - vmstat]

No comments:

Post a Comment

Labels

WAREHOUSE

Total Pageviews