Wednesday, February 27, 2013

Concurrent Statistics

‎Gathering optimizer statistics is one of life's necessary evils even if it can take an extremely long time to complete. In this blog post, we discuss one remedy to improve the efficiency of statistics gathering.


Oracle Database 11g Release 2 ( introduces a new statistics gathering mode, 'concurrent statistics gathering'. The goal of this new mode is to enable a user to gather statistics on multiple tables in a schema (or database), and multiple (sub)partitions within a table concurrently. Gathering statistics on multiple tables and (sub)partitions concurrently can reduce the overall time it takes to gather statistics by allowing Oracle to fully utilize a multi-processor environment.

Concurrent statistics gathering is controlled by a global preference, CONCURRENT, in the DBMS_STATS package. The CONCURRENT preference is of type boolean, so it can be set to TRUE or FALSE. By default it is set to FALSE. When CONCURRENT is set to TRUE, Oracle employs Oracle Job Scheduler and Advanced Queuing components to create and manage multiple statistics gathering jobs concurrently.

If you call dbms_stats.gather_table_stats on a partitioned table when CONCURRENT is set to true, Oracle will create a separate statistics gathering job for each (sub)partition in the table. The Job Scheduler will decide how many of these jobs will execute concurrently, and how many will be queued based on available system resources. As the currently running jobs complete, more jobs will be dequeued and executed until all (sub)partitions have had their statistics gathered on them.

If you gather statistics using dbms_stats.gather_database_stats, dbms_stats.gather_schema_stats , or dbms_stats.gather_dictionary_stats, then Oracle will create a separate statistics gathering job for each non-partitioned table, and each (sub)partition for the partitioned tables. Each partitioned table will also have a coordinator job that manages its (sub)partition jobs. The database will then run as many concurrent jobs as possible, and queue the remaining jobs until the executing jobs complete. However, multiple partitioned tables are not allowed to be processed simultaneously to prevent possible deadlock cases. Hence, if there are some jobs running for a partitioned table, other partitioned tables in a schema (or database or dictionary) will be queued until the current one completes. There is no such restriction for non-partitioned tables. The maximum number of concurrent statistics gathering jobs is bounded by the job_queue_processes initialization parameter (per node on a RAC environment) and the available system resources.

The following figure illustrates the creation of jobs at different levels, where Table 3 is a partitioned table, while other tables are non-partitioned tables. Job 3 acts as a coordinator job for Table 3, and creates a job for each partition in that table, as well as a separate job for the global statistics of Table 3.

As another example, assume that the parameter job_queue_processes is set to 32, and you issued a dbms_stats.gather_schema_stats on the SH schema. Oracle would create a statistics gathering job (Level 1 in Figure 1) for each of the non-partitioned tables;

And, a coordinator job for each partitioned table, i.e., SALES and COSTS, in turn creates a statistics gathering job for each of partition in SALES and COSTS tables, respectively (Level 2 in Figure 1). Then, the Oracle Job Scheduler would allow 32 statistics gathering jobs to start, and would queue the rest (assuming that there are sufficient resources for 32 concurrent jobs). Suppose that 29 jobs (one for each partition + a coordinator job) for the SALES table get started, then three non-partitioned table statistics gathering jobs would also be started. The statistics gathering jobs for the COSTS table will be automatically queued, because only for one partitioned table can be processed at any one time. As each job finishes, another job will be dequeued and started, until all 63 jobs have been completed.
Configuration and Settings
In Oracle Database, the concurrency setting for statistics gathering is turned off by default. It can be turned on using the following command.
You will also need some additional privileges above and beyond the regular privileges required to gather statistics. The user must have the following Job Scheduler and AQ privileges:
The SYSAUX tablespace should be online, as the Job Scheduler stores its internal tables and views in SYSAUX tablespace.
Finally the job_queue_processes parameter should be set to at least 4. If you want to fully utilize all of the system resources during the statistics gathering process but you don't plan to use parallel execution you should set the job_queue_processes to 2* total number of CPU cores (this is a per node parameter in a RAC environment).Please make sure that you set this parameter system-wise (i.e., ALTER SYSTEM ... or in init.ora file) rather than at the session level (i.e., ALTER SESSION).

Using Concurrent Statistics Gathering with Parallel Execution

When using concurrent statistics gathering it is still possible to have each individual statistics gather job execute in parallel. This combination is normally used when the objects (tables or (sub)partitions) being analyzed are large. If you plan to execute the concurrent statistics gathering jobs in parallel you should disable the parallel adaptive multi-user initialization parameter. That is;
Alter system set parallel_adaptive_multi_user=false;

It is also recommended that you enable parallel statement queuing. This requires Resource Manager to be activated (if not already), and the creation of a temporary resource plan where the consumer group "OTHER_GROUPS" should have queuing enabled. By default, Resource Manager is activated only during the maintenance windows. The following script illustrates one way of creating a temporary resource plan (pqq_test), and enabling the Resource Manager with this plan.
-- connect as a user with dba privileges
dbms_resource_manager.create_plan('pqq_test', 'pqq_test');
'OTHER_GROUPS directive for pqq',
parallel_target_percentage => 90);
Monitoring Concurrent Statistics Gathering Jobs

You can use the standard database scheduler views to monitor the concurrent statistics gathering jobs. The comments field of a job in dba_scheduler_jobs shows the target object for that statistics gathering job in the form of owner.table.(sub)partition. All statistics collection job names start with 'ST$' for easy identification. The jobs whose name start with ST$T###_### are created by a coordinator job for a partitioned table, and works on a partition or subpartition of the table.

The jobs whose name begin with ST$SD###_### are created for a table in a schema or database, and either works as a coordinator for its partition level jobs (if the table is partitioned), or directly performs the statistics collection for the table (if the table is not partitioned).

Finally, those with ST$D###_### in their naming are created for dictionary level tasks (when gather_dictionary_stats is used), and jobs does similar tasks as those with SD in their names.

Using the following query you can see all of the concurrent statistics gathering jobs that have been created:

select job_name, state, comments
from dba_scheduler_jobs
where job_class like 'CONC%'

To only see the currently running jobs, filter by the job state:
select job_name, state, comments
from dba_scheduler_jobs
where job_class like 'CONC%'
and state = 'RUNNING';

Similarly, to see the scheduled (i.e., waiting to run) jobs you just need to change the state:
select job_name, state, comments
from dba_scheduler_jobs
where job_class like 'CONC%'
and state = 'SCHEDULED';

It is also possible to see the elapse time for the currently running statistics gathering jobs:
select job_name, elapsed_time
from dba_scheduler_running_jobs
where job_name like 'ST$%';

Known Limitations

Concurrent statistics gathering does not use any automated way to determine the maximum number of jobs that can concurrently run based on the system load and resources. Concurrent statistics gathering solely relies on the Job Scheduler and the value of the job_queue_processes parameter. Session-wide parameter settings and events are not  transferred to the newly created jobs (scheduler limitation). Indexes are not gathered concurrently.

No comments:

Post a Comment

Share your knowledge it really improves, don't show off...