Thursday, October 4, 2012

Oracle 11g Kernel Parameter Settings


Oracle 11g Kernel Parameter settings

Open the /etc/sysctl.conf file, and add or edit lines similar to the following  Ex:
fs.aio-max-nr = 1048576
fs.file-max = 6815744
kernel.shmall = 2097152
kernel.shmmax = 4294967295
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048586
Setting SHMMAX Parameter

This parameter defines the maximum size in bytes of a single shared memory segment that a Linux process can allocate in its virtual address space.
Since the SGA is comprised of shared memory, SHMMAX can potentially limit the size of the SGA. SHMMAX should be slightly larger than the SGA size. If SHMMAX is too small, you can get error messages similar to this one:
ORA-27123: unable to attach to shared memory segment
To determine the maximum size of a shared memory segment, run:
# cat /proc/sys/kernel/shmmax
2147483648
The default shared memory limit for SHMMAX can be changed in the proc file system without reboot:
# echo 2147483648 > /proc/sys/kernel/shmmax
Alternatively, you can use sysctl(8) to change it:
# sysctl -w kernel.shmmax=2147483648
To make a change permanent, add the following line to the file /etc/sysctl.conf (your setting may vary). This file is used during the boot process.
# echo "kernel.shmmax=2147483648" >> /etc/sysctl.conf

Setting SHMMNI Parameter

This parameter sets the system wide maximum number of shared memory segments.
Oracle recommends SHMMNI to be at least 4096 for Oracle 10g. For Oracle 9i on x86 the recommended minimum setting is lower. Since these recommendations are minimum settings, it's best to set it always to at least 4096 for 9i and 10g databases on x86 and x86-64 platforms.

To determine the system wide maximum number of shared memory segments, run:
# cat /proc/sys/kernel/shmmni
4096
The default shared memory limit for SHMMNI can be changed in the proc file system without reboot:
# echo 4096 > /proc/sys/kernel/shmmni
Alternatively, you can use sysctl(8) to change it:
# sysctl -w kernel.shmmni=4096
To make a change permanent, add the following line to the file /etc/sysctl.conf. This file is used during the boot process.
# echo "kernel.shmmni=4096" >> /etc/sysctl.conf

Setting SHMALL Parameter

This parameter sets the total amount of shared memory pages that can be used system wide. Hence, SHMALL should always be at least
ceil(shmmax/PAGE_SIZE).
If you are not sure what the default PAGE_SIZE is on your Linux system, you can run the following command:
$ getconf PAGE_SIZE
4096
To determine the system wide maximum number of shared memory pages, run:
# cat /proc/sys/kernel/shmall
2097152
The default shared memory limit for SHMALL can be changed in the proc file system without reboot:
# echo 2097152 > /proc/sys/kernel/shmall
Alternatively, you can use sysctl(8) to change it:
# sysctl -w kernel.shmall=2097152
To make a change permanent, add the following line to the file /etc/sysctl.conf. This file is used during the boot process.
# echo "kernel.shmall=2097152" >> /etc/sysctl.conf

Removing Shared Memory

Sometimes after an instance crash you may have to remove Oracle's shared memory segment(s) manually.

To see all shared memory segments that are allocated on the system, execute:
$ ipcs -m

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x8f6e2129 98305      oracle    600        77694523   0
0x2f629238 65536      oracle    640        2736783360 35
0x00000000 32768      oracle    640        2736783360 0           dest

In this example you can see that three shared memory segments have been allocated. The output also shows that shmid 32768 is an abandoned shared memory segment from a past ungraceful Oracle shutdown. Status "dest" means that this memory segment is marked to be destroyed. To find out more about this shared memory segment you can run:
$ ipcs -m -i 32768
Shared memory Segment shmid=32768
uid=500 gid=501 cuid=500 cgid=501
mode=0640 access_perms=0640
bytes=2736783360 lpid=3688 cpid=3652 nattch=0
att_time=Sat Oct 29 13:36:52 2005
det_time=Sat Oct 29 13:36:52 2005
change_time=Sat Oct 29 11:21:06 2005

To remove the shared memory segment, you could copy/paste shmid and execute:
$ ipcrm shm 32768

Another approach to remove shared memory is to use Oracle's sysresv utility. Here are a few self explanatory examples on how to use sysresv:

Checking Oracle's IPC resources:
$ sysresv

IPC Resources for ORACLE_SID "orcl" :
Shared Memory
ID              KEY
No shared memory segments used
Semaphores:
ID              KEY
No semaphore resources used
Oracle Instance not alive for sid "orcl"
$

Instance is up and running:
$ sysresv -i

IPC Resources for ORACLE_SID "orcl" :
Shared Memory:
ID              KEY
2818058         0xdc70f4e4
Semaphores:
ID              KEY
688128          0xb11a5934
Oracle Instance alive for sid "orcl"
SYSRESV-005: Warning
        Instance maybe alive - aborting remove for sid "orcl"
$

Instance has crashed and resources were not released:
$ sysresv -i

IPC Resources for ORACLE_SID "orcl" :
Shared Memory:
ID              KEY
32768           0xdc70f4e4
Semaphores:
ID              KEY
98304           0xb11a5934
Oracle Instance not alive for sid "orcl"
Remove ipc resources for sid "orcl" (y/n)?y
Done removing ipc resources for sid "orcl"
$

Semaphores can be described as counters which are used to provide synchronization between processes or between threads within a process for shared resources like shared memories. System V semaphores support semaphore sets where each one is a counting semaphore. So when an application requests semaphores, the kernel releases them in sets. The number of semaphores per set can be defined through the kernel parameter SEMMSL.

To see all semaphore settings, run:
ipcs -ls
The SEMMSL Parameter
This parameter defines the maximum number of semaphores per semaphore set.
NOTE:
If a database gets thousands of concurrent connections where the ora.init parameter
PROCESSES is very large, then SEMMSL should be larger as well. Note what Metalink Note:187405.1 and Note:184821.1 have to say regarding SEMMSL: "The SEMMSL setting should be 10 plus the largest PROCESSES parameter of any Oracle database on the system". Even though these notes talk about 9i databases this SEMMSL rule also applies to 10g databases. I've seen low SEMMSL settings to be an issue for 10g RAC databases where Oracle recommended to increase SEMMSL and to calculate it according to the rule mentioned in these notes.
The SEMMNI Parameter

This parameter defines the maximum number of semaphore sets for the entire Linux system.
The SEMMNS Parameter

This parameter defines the total number of semaphores (not semaphore sets) for the entire Linux system. A semaphore set can have more than one semaphore, and as the semget(2) man page explains, values greater than SEMMSL * SEMMNI makes it irrelevant. The maximum number of semaphores that can be allocated on a Linux system will be the lesser of: SEMMNS or (SEMMSL * SEMMNI).
The SEMOPM Parameter

This parameter defines the maximum number of semaphore operations that can be performed per
semop(2) system call (semaphore call). The semop(2) function provides the ability to do operations for multiple semaphores with one semop(2) system call. Since a semaphore set can have the maximum number of SEMMSL semaphores per semaphore set, it is often recommended to set SEMOPM equal to SEMMSL.
Setting Semaphore Parameters

To determine the values of the four described semaphore parameters, run:
# cat /proc/sys/kernel/sem
250     32000   32      128
These values represent SEMMSL, SEMMNS, SEMOPM, and SEMMNI.

Alternatively, you can run:
# ipcs -ls
All four described semaphore parameters can be changed in the proc file system without reboot:
# echo 250 32000 100 128 > /proc/sys/kernel/sem
Alternatively, you can use sysctl(8) to change it:
sysctl -w kernel.sem="250 32000 100 128"
To make the change permanent, add or change the following line in the file /etc/sysctl.conf. This file is used during the boot process.
echo "kernel.sem=250 32000 100 128" >> /etc/sysctl.conf

The maximum number of file handles specifies the maximum number of open files on a Linux system.

Oracle recommends that the file handles for the entire system is set to at least 65536 for 9i R2 and 10g R1/2 for x86 and x86-64 platforms.

To determine the maximum number of file handles for the entire system, run:
cat /proc/sys/fs/file-max
To determine the current usage of file handles, run:
$ cat /proc/sys/fs/file-nr
1154    133     8192
The file-nr file displays three parameters:
  - Total allocated file handles
  - Currently number of used file handles (2.4 kernel); Currently number of unused file handles (2.6 kernel)
  - Maximum file handles that can be allocated (see also
/proc/sys/fs/file-max)

The kernel dynamically allocates file handles whenever a file handle is requested by an application but the kernel does not free these file handles when they are released by the application. The kernel recycles these file handles instead. This means that over time the total number of allocated file handles will increase even though the number of currently used file handles may be low.

The maximum number of file handles can be changed in the proc file system without reboot:
# echo 65536 > /proc/sys/fs/file-max
Alternatively, you can use sysctl(8) to change it:
# sysctl -w fs.file-max=65536
To make the change permanent, add or change the following line in the file /etc/sysctl.conf. This file is used during the boot process.
# echo "fs.file-max=65536" >> /etc/sysctl.conf

Changing Network Kernel Settings

Oracle now uses UDP as the default protocol on Linux for interprocess communication, such as cache fusion buffer transfers between the instances. But starting with Oracle 10g network settings should be adjusted for standalone databases as well.

Oracle recommends the default and maximum send buffer size (
SO_SNDBUF socket option) and receive buffer size (SO_RCVBUF socket option) to be set to 256 KB. The receive buffers are used by TCP and UDP to hold the received data for the application until it's read. This buffer cannot overflow because the sending party is not allowed to send data beyond the buffer size window. This means that datagrams will be discarded if they don't fit in the receive buffer. This could cause the sender to overwhelm the receiver

The default and maximum window size can be changed in the proc file system without reboot:
# sysctl -w net.core.rmem_default=262144  # Default setting in bytes of the socket receive buffer
# sysctl -w net.core.wmem_default=262144  # Default setting in bytes of the socket send buffer
# sysctl -w net.core.rmem_max=262144      # Maximum socket receive buffer size which may be set by using the SO_RCVBUF socket option
# sysctl -w net.core.wmem_max=262144      # Maximum socket send buffer size which may be set by using the SO_SNDBUF socket option
To make the change permanent, add the following lines to the /etc/sysctl.conf file, which is used during the boot process:
net.core.rmem_default=262144
net.core.wmem_default=262144
net.core.rmem_max=262144
net.core.wmem_max=262144
To improve failover performance in a RAC cluster, consider changing the following IP kernel parameters as well:
net.ipv4.tcp_keepalive_time
net.ipv4.tcp_keepalive_intvl
net.ipv4.tcp_retries2
net.ipv4.tcp_syn_retries
Changing these settings may be highly dependent on your system, network, and other applications. For suggestions, see Metalink Note:249213.1 and Note:265194.1.
Most shells like Bash provide control over various resources like the maximum allowable number of open file descriptors or the maximum number of processes available to a user.

To see all shell limits, run:
ulimit -a
For more information on ulimit for the Bash shell, see man bash and search for ulimit.
Limiting Maximum Number of Open File Descriptors for the Oracle User

After
/proc/sys/fs/file-max has been changed, see Setting File Handles, there is still a per user limit of maximum open file descriptors:
$ su - oracle
$ ulimit -n
1024
$
To change this limit, edit the /etc/security/limits.conf file as root and make the following changes or add the following lines, respectively:
oracle           soft    nofile          4096
oracle           hard    nofile          63536
The "soft limit" in the first line defines the number of file handles or open files that the Oracle user will have after login. If the Oracle user gets error messages about running out of file handles, then the Oracle user can increase the number of file handles like in this example up to 63536 ("hard limit") by executing the following command:

ulimit -n 63536
You can set the "soft" and "hard" limits higher if necessary.
To see the current limit of the maximum number of processes for the oracle user, run:
$ su - oracle
$ ulimit -u
Note the ulimit options are different for other shells.

To change the "soft" and "hard" limits for the maximum number of processes for the
oracle user, add the following lines to the /etc/security/limits.conf file:
oracle           soft    nproc          2047
oracle           hard    nproc          16384


Checking Memory Usage

To determine the size and usage of memory, you can enter the following command:
grep MemTotal /proc/meminfo

Alternatively, you can use the free(1) command to check the memory:
$ free
             total       used       free     shared    buffers     cached
Mem:       4040360    4012200      28160          0     176628    3571348
-/+ buffers/cache:     264224    3776136
Swap:      4200956      12184    4188772
$
In this example the total amount of available memory is 4040360 KB. 264224 KB are used by processes and 3776136 KB are free for other applications. Don't get confused by the first line which shows that 28160KB are free! If you look at the usage figures you can see that most of the memory use is for buffers and cache since Linux always tries to use RAM to the fullest extent to speed up disk operations. Using available memory for buffers (file system metadata) and cache (pages with actual contents of files or block devices) helps the system to run faster because disk information is already in memory which saves I/O. If space is needed by programs or applications like Oracle, then Linux will free up the buffers and cache to yield memory for the applications. So if your system runs for a while you will usually see a small number under the field "free" on the first line.


Tuning Page Cache

Page Cache is a disk cache which holds data of files and executable programs, i.e. pages with actual contents of files or block devices. Page Cache (disk cache) is used to reduce the number of disk reads. To control the percentage of total memory used for page cache in RHEL 3, the following kernel parameter can be changed:
# cat /proc/sys/vm/pagecache
1       15      30
The above three values are usually good for database systems. It is not recommended to set the third value very high like 100 as it used to be with older RHEL 3 kernels. This can cause significant performance problems for database systems. If you upgrade to a newer kernel like 2.4.21-37, then these values will automatically change to "1 15 30" unless it's set to different values in /etc/sysctl.conf. For information on tuning the pagecache kernel parameter, I recommend reading the excellent article Understanding Virtual Memory. Note this kernel parameter does not exist in RHEL 4.

The pagecache parameters can be changed in the proc file system without reboot:
# echo "1 15 30" > /proc/sys/vm/pagecache
Alternatively, you can use sysctl(8) to change it:
# sysctl -w vm.pagecache="1 15 30"
To make the change permanent, add the following line to the file /etc/sysctl.conf. This file is used during the boot process.
# echo "vm.pagecache=1 15 30" >> /etc/sysctl.conf

General

In some cases it's good for the swap partition to be used. For example, long running processes often access only a subset of the page frames they obtained. This means that the swap partition can safely be used even if memory is available because system memory could be better served for disk cache to improve overall system performance. In fact, in the 2.6 kernel, i.e. RHEL 4, you can define a threshold when processes should be swapped out in favor of I/O caching. This can be tuned with the
/proc/sys/vm/swappiness kernel parameter. The default value of /proc/sys/vm/swappiness is 60 which means that applications and programs that have not done a lot lately can be swapped out. Higher values will provide more I/O cache and lower values will wait longer to swap out idle applications.

Depending on the system profile you may see that swap usage slowly increases with system uptime. To display swap usage you can run the
free(1) command or you can check the /proc/meminfo file. When the system uses swap space it will sometimes not decrease afterward. This saves I/O if memory is needed and pages don't have to be swapped out again when the pages are already in the swap space. However, if swap usage gets close to 80% - 100% (your threshold may be lower if you use a large swap space), then a closer look should be taken at the system, see also Checking Swap Space Size and Usage. Depending on the size of your swap space, you may want to check swap activity with vmstat or sar if swap allocation is lower than 80%. But these numbers really depend on the size of the swap space. The actual numbers of swapped pages per timeframe from vmstat or sar are the important numbers. Constant swapping should be avoided at all cost.

Note, never add a permanent swap file to the system due to the performance impact of the filesystem layer.
<!--[if !supportLineBreakNewLine]-->
<!--[endif]-->
Checking Swap Space Size and Usage

You can check the size and current usage of swap space by running one of the following two commands:
grep SwapTotal /proc/meminfo
cat /proc/swaps
free

Swap usage may slowly increase as shown above but should stop at some point. If swap usage continues to grow steadily or is already large, then one of the following choices may need to be considered:
- Add more RAM or reduce the size of the SGA
- Increase the size of the swap space

If you see constant swapping, then you need to either add more RAM or reduce the size of the SGA. Constant swapping should be avoided at all cost. You can check current swap activity using the following commands:
$ vmstat 3 100
procs                      memory      swap          io     system         cpu
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0      0 972488   7148  20848    0    0   856     6  138    53  0  0 99  0
 0  1      0 962204   9388  20848    0    0   747     0 4389  8859 23 24 11 41
 0  1      0 959500  10728  20848    0    0   440   313 1496  2345  4  7  0 89
 0  1      0 956912  12216  20848    0    0   496     0 2294  4224 10 13  0 77
 1  1      0 951600  15228  20848    0    0   997   264 2241  3945  6 13  0 81
 0  1      0 947860  17188  20848    0    0   647   280 2386  3985  9  9  1 80
 0  1      0 944932  19304  20848    0    0   705     0 1501  2580  4  9  0 87
The fields si and so show the amount of memory paged in from disk and paged out to disk, respectively. If the server shows continuous swap activity then more memory should be added or the SGA size should be reduced. To check the history of swap activity, you can use the sar command.
For example, to check swap activity from Oct 12th:
# ls -al /var/log/sa | grep "Oct 12"
-rw-r--r--    1 root     root      2333308 Oct 12 23:55 sa12
-rw-r--r--    1 root     root      4354749 Oct 12 23:53 sar12
# sar -W -f /var/log/sa/sa12
Linux 2.4.21-32.0.1.ELhugemem (rac01prd)       10/12/2005

12:00:00 AM  pswpin/s pswpout/s
12:05:00 AM      0.00      0.00
12:10:00 AM      0.00      0.00
12:15:00 AM      0.00      0.00
12:20:00 AM      0.00      0.00
12:25:00 AM      0.00      0.00
12:30:00 AM      0.00      0.00
...
The fields pswpin and pswpout show the total number of pages brought in and out per second, respectively.

If the server shows sporadic swap activity or swap activity for a short period time at certain invervals, then you can either add more swap space or RAM. If swap usage is already very large (don't confuse it with constant swapping), then I would add more RAM.

1 comment:

  1. Hi, you are using wrong values: net.core.wmem_max = 1048586
    it should be "1048576"!
    You have probably copied from somewhere else and keep using this wrong value

    ReplyDelete

Share your knowledge it really improves, don't show off...