Gathering
optimizer statistics is one of life's necessary evils even if it can take an
extremely long time to complete. In this blog post, we discuss one remedy to
improve the efficiency of statistics gathering.
Introduction
Oracle Database 11g Release 2 (11.2.0.2) introduces a new statistics gathering mode, 'concurrent statistics gathering'. The goal of this new mode is to enable a user to gather statistics on multiple tables in a schema (or database), and multiple (sub)partitions within a table concurrently. Gathering statistics on multiple tables and (sub)partitions concurrently can reduce the overall time it takes to gather statistics by allowing Oracle to fully utilize a multi-processor environment.
Concurrent
statistics gathering is controlled by a global preference, CONCURRENT, in the
DBMS_STATS package. The CONCURRENT preference is of type boolean, so it
can be set to TRUE or FALSE. By default it is set to FALSE. When CONCURRENT is
set to TRUE, Oracle employs Oracle Job Scheduler and Advanced Queuing
components to create and manage multiple statistics gathering jobs
concurrently.
If you call
dbms_stats.gather_table_stats on a partitioned table when CONCURRENT is set to
true, Oracle
will create a separate statistics gathering job for each (sub)partition in the
table. The Job Scheduler will decide how many of these jobs will execute
concurrently, and how many will be queued based on available system resources.
As the currently running jobs complete, more jobs will be dequeued and executed
until all (sub)partitions have had their statistics gathered on them.
If you
gather statistics using dbms_stats.gather_database_stats, dbms_stats.gather_schema_stats
, or dbms_stats.gather_dictionary_stats, then Oracle will create a separate
statistics gathering job for each non-partitioned table, and each
(sub)partition for the partitioned tables. Each partitioned table will also
have a coordinator job that manages its (sub)partition jobs. The database will
then run as many concurrent jobs as possible, and queue the remaining jobs
until the executing jobs complete. However, multiple partitioned tables are not
allowed to be processed simultaneously to prevent possible deadlock cases.
Hence, if there are some jobs running for a partitioned table, other
partitioned tables in a schema (or database or dictionary) will be queued until
the current one completes. There is no such restriction for non-partitioned
tables. The maximum number of concurrent
statistics gathering jobs is bounded by the job_queue_processes initialization
parameter (per node on a RAC environment) and the available system
resources.
The
following figure illustrates the creation of jobs at different levels, where
Table 3 is a partitioned table, while other tables are non-partitioned tables.
Job 3 acts as a coordinator job for Table 3, and creates a job for each
partition in that table, as well as a separate job for the global statistics of
Table 3.
Using Concurrent Statistics Gathering with Parallel Execution
When using concurrent statistics gathering it is still possible to have each individual statistics gather job execute in parallel. This combination is normally used when the objects (tables or (sub)partitions) being analyzed are large. If you plan to execute the concurrent statistics gathering jobs in parallel you should disable the parallel adaptive multi-user initialization parameter. That is;
It is also recommended that you enable parallel statement queuing. This requires Resource Manager to be activated (if not already), and the creation of a temporary resource plan where the consumer group "OTHER_GROUPS" should have queuing enabled. By default, Resource Manager is activated only during the maintenance windows. The following script illustrates one way of creating a temporary resource plan (pqq_test), and enabling the Resource Manager with this plan.
You can use the standard database scheduler views to monitor the concurrent statistics gathering jobs. The comments field of a job in dba_scheduler_jobs shows the target object for that statistics gathering job in the form of owner.table.(sub)partition. All statistics collection job names start with 'ST$' for easy identification. The jobs whose name start with ST$T###_### are created by a coordinator job for a partitioned table, and works on a partition or subpartition of the table.
The jobs whose name begin with ST$SD###_### are created for a table in a schema or database, and either works as a coordinator for its partition level jobs (if the table is partitioned), or directly performs the statistics collection for the table (if the table is not partitioned).
Finally, those with ST$D###_### in their naming are created for dictionary level tasks (when gather_dictionary_stats is used), and jobs does similar tasks as those with SD in their names.
Using the following query you can see all of the concurrent statistics gathering jobs that have been created:
Concurrent statistics gathering does not use any automated way to determine the maximum number of jobs that can concurrently run based on the system load and resources. Concurrent statistics gathering solely relies on the Job Scheduler and the value of the job_queue_processes parameter. Session-wide parameter settings and events are not transferred to the newly created jobs (scheduler limitation). Indexes are not gathered concurrently.
As another example, assume that
the parameter job_queue_processes is set to 32, and you issued a
dbms_stats.gather_schema_stats on the SH schema. Oracle would create a
statistics gathering job (Level 1 in Figure 1) for each of the non-partitioned
tables;
- SUPPLEMENTARY_DEMOGRAPHICS
- COUNTRIES
- CUSTOMERS
- PROMOTIONS
- CHANNELS
- PRODUCTS
- TIMES
And, a
coordinator job for each partitioned table, i.e., SALES and COSTS, in turn
creates a statistics gathering job for each of partition in SALES and COSTS
tables, respectively (Level 2 in Figure 1). Then, the Oracle Job Scheduler
would allow 32 statistics gathering jobs to start, and would queue the rest
(assuming that there are sufficient resources for 32 concurrent jobs). Suppose
that 29 jobs (one for each partition + a coordinator job) for the SALES table
get started, then three non-partitioned table statistics gathering jobs would
also be started. The statistics gathering jobs for the COSTS table will be
automatically queued, because only for one partitioned table can be processed
at any one time. As each job finishes, another job will be dequeued and
started, until all 63 jobs have been completed.
Configuration
and Settings
In Oracle
Database 11.2.0.2, the concurrency setting for statistics gathering is turned
off by default. It can be turned on using the following command.
Begin
DBMS_STATS.SET_GLOBAL_PREFS('CONCURRENT','TRUE');
End;
/
You will
also need some additional privileges above and beyond the regular privileges
required to gather statistics. The user must have the following Job Scheduler
and AQ privileges:
- CREATE
JOB
- MANAGE
SCHEDULER
- MANAGE
ANY QUEUE
The
SYSAUX tablespace should be online, as the Job Scheduler stores its internal
tables and views in SYSAUX tablespace.
Finally
the job_queue_processes parameter should be set to at least 4. If you want to fully
utilize all of the system resources during the statistics gathering process but
you don't plan to use parallel execution you should set the job_queue_processes
to 2* total number of
CPU cores (this is a per node parameter in a RAC environment).Please make sure that you set
this parameter system-wise (i.e., ALTER SYSTEM ... or in init.ora file) rather
than at the session level (i.e., ALTER SESSION).
Using Concurrent Statistics Gathering with Parallel Execution
When using concurrent statistics gathering it is still possible to have each individual statistics gather job execute in parallel. This combination is normally used when the objects (tables or (sub)partitions) being analyzed are large. If you plan to execute the concurrent statistics gathering jobs in parallel you should disable the parallel adaptive multi-user initialization parameter. That is;
Alter
system set parallel_adaptive_multi_user=false;
It is also recommended that you enable parallel statement queuing. This requires Resource Manager to be activated (if not already), and the creation of a temporary resource plan where the consumer group "OTHER_GROUPS" should have queuing enabled. By default, Resource Manager is activated only during the maintenance windows. The following script illustrates one way of creating a temporary resource plan (pqq_test), and enabling the Resource Manager with this plan.
-- connect as a user with dba privileges
begin
dbms_resource_manager.create_pending_area();
dbms_resource_manager.create_plan('pqq_test', 'pqq_test');
dbms_resource_manager.create_plan_directive(
'pqq_test',
'OTHER_GROUPS',
'OTHER_GROUPS directive for pqq',
parallel_target_percentage => 90);
dbms_resource_manager.submit_pending_area();
end;
/
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = 'pqq_test' SID='*';
begin
dbms_resource_manager.create_pending_area();
dbms_resource_manager.create_plan('pqq_test', 'pqq_test');
dbms_resource_manager.create_plan_directive(
'pqq_test',
'OTHER_GROUPS',
'OTHER_GROUPS directive for pqq',
parallel_target_percentage => 90);
dbms_resource_manager.submit_pending_area();
end;
/
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = 'pqq_test' SID='*';
Monitoring
Concurrent Statistics Gathering Jobs
You can use the standard database scheduler views to monitor the concurrent statistics gathering jobs. The comments field of a job in dba_scheduler_jobs shows the target object for that statistics gathering job in the form of owner.table.(sub)partition. All statistics collection job names start with 'ST$' for easy identification. The jobs whose name start with ST$T###_### are created by a coordinator job for a partitioned table, and works on a partition or subpartition of the table.
The jobs whose name begin with ST$SD###_### are created for a table in a schema or database, and either works as a coordinator for its partition level jobs (if the table is partitioned), or directly performs the statistics collection for the table (if the table is not partitioned).
Finally, those with ST$D###_### in their naming are created for dictionary level tasks (when gather_dictionary_stats is used), and jobs does similar tasks as those with SD in their names.
Using the following query you can see all of the concurrent statistics gathering jobs that have been created:
select
job_name, state, comments
from
dba_scheduler_jobs
where
job_class like 'CONC%'
|
To only see the currently running jobs, filter by
the job state:
select job_name, state, comments
from dba_scheduler_jobs
where job_class like 'CONC%'
and state = 'RUNNING';
|
Similarly,
to see the scheduled (i.e., waiting to run) jobs you just need to change the
state:
select job_name, state, comments
from dba_scheduler_jobs
where job_class like 'CONC%'
and state = 'SCHEDULED';
|
It is
also possible to see the elapse time for the currently running statistics
gathering jobs:
select job_name, elapsed_time
from dba_scheduler_running_jobs
where job_name like 'ST$%';
|
Known
Limitations
Concurrent statistics gathering does not use any automated way to determine the maximum number of jobs that can concurrently run based on the system load and resources. Concurrent statistics gathering solely relies on the Job Scheduler and the value of the job_queue_processes parameter. Session-wide parameter settings and events are not transferred to the newly created jobs (scheduler limitation). Indexes are not gathered concurrently.
No comments:
Post a Comment
Share your knowledge it really improves, don't show off...