Oracle Database In Our Hand: Configuring the Instance Recovery Performance

Hi Guys,

This article will explain how instance recovery will effect the database performance.

Understanding Instance Recovery

Instance and crash recovery are the automatic application of redo log records to Oracle data blocks after a crash or system failure. During normal operation, if an instance is shutdown cleanly as when using a SHUTDOWN IMMEDIATE statement, rather than terminated abnormally, then the in-memory changes that have not already been written to the datafiles on disk are written to disk as part of the checkpoint performed during shutdown.

However, if a single instance database crashes or if all instances of an Oracle Real Application Cluster configuration crash, then Oracle performs crash recovery at the next startup. If one or more instances of an Oracle Real Application Cluster configuration crash, then a surviving instance performs instance recovery automatically. Instance and crash recovery occur in two steps: cache recovery followed by transaction recovery.

Cache Recovery (Rolling Forward)

During the cache recovery step, Oracle applies all committed and uncommitted changes in the redo log files to the affected data blocks. The work required for cache recovery processing is proportional to the rate of change to the database (update transactions each second) and the time between checkpoints.

Transaction Recovery (Rolling Back)

To make the database consistent, the changes that were not committed at the time of the crash must be undone (in other words, rolled back). During the transaction recovery step, Oracle applies the rollback segments to undo the uncommitted changes. The work required to do transaction recovery is proportional to the number and size of uncommitted transactions when the system fault occurred.

Checkpointing and Cache Recovery

Periodically, Oracle records a checkpoint. A checkpoint is the highest system change number (SCN) such that all data blocks less than or equal to that SCN are known to be written out to the data files. If a failure occurs, then only the redo records containing changes at SCNs higher than the checkpoint need to be applied during recovery. The duration of cache recovery processing is determined by two factors: the number of data blocks that have changes at SCNs higher than the SCN of the checkpoint, and the number of log blocks that need to be read to find those changes.

How Checkpoints Affect Performance

Frequent checkpointing writes dirty buffers to the datafiles more often than otherwise, and so reduces cache recovery time in the event of an instance failure. If checkpointing is frequent, then applying the redo records in the redo log between the current checkpoint position and the end of the log involves processing relatively few data blocks. This means that the cache recovery phase of recovery is fairly short. However, in a high-update system, frequent checkpointing can reduce runtime performance, because checkpointing causes DBWn processes to perform writes.

Fast Instance Recovery Trade-offs

To minimize the duration of instance recovery, you must force Oracle to checkpoint often, thus keeping the number of redo log records to be applied during recovery to a minimum. However, in a high-update system, frequent checkpointing increases the overhead for normal database operations.

If daily operational efficiency is more important than minimizing recovery time, then decrease the frequency of writes to data files due to checkpoints. This should improve operational efficiency, but also increase instance recovery time.

Reducing Checkpoint Frequency to Optimize Runtime Performance

To reduce the checkpoint frequency and optimize runtime performance, you can do the following:

Size your online redo log files according to the amount of redo your system generates. A rough guide is to switch logs at most once every twenty minutes. Small log files can increase checkpoint activity and reduce performance. You should make all logs the same size.
Set the value of the LOG_CHECKPOINT_INTERVAL initialization parameter (in multiples of physical block size) to zero. This value eliminates interval checkpoints.
Set the value of the LOG_CHECKPOINT_TIMEOUT initialization parameter to zero. This value eliminates time-based checkpoints.
Set the value of FAST_START_MTTR_TARGET (and FAST_START_IO_TARGET) to zero to disable fast-start checkpointing.

These are the parameters which plays roles in instance recovery.

Parameter	Purpose
FAST_START_MTTR_TARGET	Lets you specify in seconds the expected mean time to recover (MTTR), which is the expected amount of time Oracle takes to perform recovery and startup the instance.
FAST_START_IO_TARGET	This initialization parameter has been deprecated in favour of FAST_START_MTTR_TARGET. This parameter specifies the upper limit on the number of dirty buffers.
LOG_CHECKPOINT_TIMEOUT	Limits the number of seconds between the most recent redo record and the checkpoint.
LOG_CHECKPOINT_INTERVAL	Limits the number of redo blocks generated between the most recent redo record and the checkpoint.
RECOVERY_PARALLELISM	Specifies the number of concurrent recovery processes to be used in instance or crash recovery.
LOG_PARALLELISM	Specifies the level of concurrency for redo allocation within Oracle.

You should disable or remove the FAST_START_IO_TARGET, LOG_CHECKPOINT_INTERVAL, and LOG_CHECKPOINT_TIMEOUT initialization parameters when using FAST_START_MTTR_TARGET. Setting these parameters to active values interferes with FAST_START_MTTR_TARGET, resulting in a different than expected value in the TARGET_MTTR column of the V$INSTANCE_RECOVERY view.

The FAST_START_MTTR_TARGET initialization parameter simplifies the configuration of recovery time from instance or system failure. This parameter lets you specify the number of seconds crash or instance recovery is expected to take. The FAST_START_MTTR_TARGET is internally converted to a set of parameters that modify the operation of Oracle such that recovery time is as close to this estimate as possible.

The foundation of fast-start recovery is the fast-start checkpointing architecture. Instead of the conventional event driven (that is, log switching) checkpointing, which does bulk writes, fast-start checkpointing occurs incrementally. Each DBWn process periodically writes buffers to disk to advance the checkpoint position. The oldest modified blocks are written first to ensure that every write lets the checkpoint advance. Fast-start checkpointing eliminates bulk writes and the resultant I/O spikes that occur with conventional checkpointing.

The maximum value for FAST_START_MTTR_TARGET is 3600, or one hour. If you set the value to more than 3600, then Oracle rounds it to 3600. There is no minimum value for FAST_START_MTTR_TARGET.

Set LOG_CHECKPOINT_TIMEOUT to Influence the Amount of Redo

Set the initialization parameter LOG_CHECKPOINT_TIMEOUT to an integer value n to require that the latest checkpoint position follow the most recent redo block by no more than n seconds. In other words, at most, n seconds worth of logging activity can occur between the most recent checkpoint position and the last block written to the redo log. This forces the checkpoint position to keep pace with the most recent redo block.

Set LOG_CHECKPOINT_INTERVAL to Influence the Amount of Redo

Set the initialization parameter LOG_CHECKPOINT_INTERVAL to a value n (where n is an integer) to require that the checkpoint position never follow the most recent redo block by more than n blocks. In other words, at most n redo blocks can exist between the checkpoint position and the last block written to the redo log. In effect, you are limiting the amount of redo blocks that can exist between the checkpoint and the end of the log.

Parallel Recovery to Speed up Redo Application

Use parallel recovery to tune the cache recovery phase of recovery. Parallel recovery uses a division of labor approach to allocate different processes to different data blocks during the cache recovery phase of recovery.

Use the RECOVERY_PARALLELISM initialization parameter to specify the number of concurrent recovery processes for instance or crash recovery. To use parallel processing, the value of RECOVERY_PARALLELISM must be greater than 1 and cannot exceed the value of the PARALLEL_MAX_SERVERS initialization parameter.

Monitoring Cache Recovery

Use the V$INSTANCE_RECOVERY view to see the current recovery parameter settings.

How to enable MTTR Advisory

Make sure that STATISTICS_LEVEL is set to TYPICAL or ALL.To enable MTTR advisory, set the initialization parameter FAST_START_MTTR_TARGET to a nonzero value. If FAST_START_MTTR_TARGET is not specified, then MTTR advisory will be OFF.

If MTTR advisory has been turned on, V$MTTR_TARGET_ADVICE shows the advisory information collected.

Tuning Transaction Recovery

During the second phase of instance recovery, Oracle rolls back uncommitted transactions. Oracle uses two features, fast-start on-demand rollback and fast-start parallel rollback, to increase the efficiency of this recovery phase.

Using Fast-Start On-Demand Rollback

Using the fast-start on-demand rollback feature, Oracle automatically allows new transactions to begin as soon as the database opens, which is usually a very short time after cache recovery completes. If a user attempts to access a row that is locked by a terminated transaction, Oracle rolls back only those changes necessary to complete the transaction; in other words, it rolls them back on demand.

For both the cases , CPU_Count parameter value effects the performance.

Monitor the progress of fast-start parallel rollback by examining the V$FAST_START_SERVERS and V$FAST_START_TRANSACTIONS views. V$FAST_START_SERVERS provides information about all recovery processes performing fast-start parallel rollback. V$FAST_START_TRANSACTIONS contains data about the progress of the transactions.

Oracle Database In Our Hand

Friday, March 9, 2012

Configuring the Instance Recovery Performance