Oracle 11g Extension Statistics and Expression Statistics(原创)

Overview Extended Statistics
In this release, Oracle has introduced major new capabilities in statistics gathering, which are referred to as extended statistics, to make the optimizer statistics reflect the true selectivity of the data. There are two types of extended statistics: multi- column statistics, which involve collecting statistics for column groups, and expression statistics. Extended statistics include the statistics collected for both column groups and expressions and use the following new procedures:
CREATE_EXTENDED_STATS function
DROP_EXTENDED_STATS procedure
SHOW_EXTENDED_STATS_NAME function
Multicolumn Statistics (Column groups)
The selectivity of a column is a crucial optimizer statistic, playing a key role in the execution plan that the cost optimizer creates for a SQL statement. Currently, Oracle collects statistics by computing the selectivity of each of a table’s columns separately, and ignores the relationship between the columns. However, the relationship between certain columns may be so strong that it can affect the combined selectivity of the two columns. In most cases, the optimizer assumes that the values of the different columns in a complex predicate are independent. Based on this assumption, the optimizer simply multiplies the selectivity of individual predicates to arrive at the selectivity of a conjunctive predicate, which usually leads to an underestimation of the selectivity. In Oracle Database 10g, when figuring out the selectivity of multiple predicates, the query optimizer took into account the correlation between related columns only under a limited set of circumstances, as I summarize here:

• The optimizer used the number of distinct keys in an index to estimate selectivity provided all columns of a conjunctive predicate match all columns of a concatenated index key. In addition, the predicates must be equalities used in equijoins.
• If you set DYNAMIC_SAMPLING to level 4, the optimizer used dynamic sampling to estimate the selectivity of predicates involving multiple columns from a table.Because the sampling size is quite small, the results are dubious in most cases.

With the exception of the two cases presented here, the optimizer always assumed that the values of all columns in a table that were used in a complex predicate were independent of each other. Based on this naïve assumption, the optimizer simply multiplied single column selectivity estimates to arrive at the selectivity of a conjunctive predicate involving multiple columns. The end result of this strategy was a severe underestimation of the real selectivity of those types of predicates in a SQL statement. Oracle Database 11g attempts to alleviate this major problem by letting you collect the following types of statistics on multiple columns in a table, which it refers to as a group of columns:

• Number of distinct values
• Density
• Number of nulls
• Frequency histograms

The idea behind the capturing of statistics for a group of columns as a single entity is to capture the underlying functional dependency between related columns in a table. The database collects the number of distinct values, the number of null values, frequency histograms, and density for groups of columns. Let’s use an example from the CUSTOMERS table in the SH schema to drive home this point. In this table, the two columns CUST_STATE_PROVINCE and COUNTRY_ID are strongly correlated. The CUST_STATE_PROVINCE column determines the value of the COUNTRY_ID column for a customer. The following query using California as the value for the CUST_STATE_PROVINCE column shows this:

SQL> select count(*)  from sh.customers
where cust_state_province = 'CA';
COUNT(*)
----------
3341
The query returns the value 3341. That is, there are a total of 3341 customers in the customers table who are from the state of California. Of course, if you issue the following query, which asks how many customers are from the state of California and the U.S. (country_id=52790), you get the same result as before:
SQL> select count(*)  from customers
where cust_state_province = 'CA'
and country_id=52790;
COUNT(*)
----------
3341
But it is clear that if you repeat this query for any COUNTRY_ID other than the U.S., the result would be, in all likelihood, zero because California is a state in the U.S. but not in the other countries. In cases such as these, it makes sense for the optimizer to rely not merely on the selectivity of the individual columns, but on the selectivity for the group of related columns as well. Oracle Database 11g lets you do precisely that— you can now gather statistics on related columns as a group, called a column group. The optimizer uses the statistics on column groups to account for the correlation between two columns. If, for example, your query has the predicates c1=1 and c2=1 and if you collect statistics on (c1, c2) as a single group, the optimizer will use the column group statistics for estimating the combined selectivity of the two predicates.

Manage extention statistics

Oracle creates column groups for related columns based on its analysis of the database workload. You can, however, create a column group yourself using the DBMS_STATS package.

-- Create a columnn group based on EMP(JOB,DEPTNO).
SET SERVEROUTPUT ON
DECLARE
l_cg_name VARCHAR2(30);
BEGIN
l_cg_name := DBMS_STATS.create_extended_stats(ownname   => 'SCOTT',
tabname   => 'EMP',
extension => '(JOB,DEPTNO)');
DBMS_OUTPUT.put_line('l_cg_name=' || l_cg_name);
END;
/
l_cg_name=SYS_STU3VG629OEYG6FN0EKTGV_HQ6

PL/SQL procedure successfully completed.
The column group name is returned using the SHOW_EXTENDED_STATS_NAME function.
-- Display the name of the columnn group.
SELECT DBMS_STATS.show_extended_stats_name(ownname   => 'SCOTT',
tabname   => 'EMP',
extension => '(JOB,DEPTNO)') AS ame
FROM dual;
CG_NAME
------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6
1 row selected.
Manually created column groups can be deleted using the DROP_EXTENDED_STATS procedure.
-- Drop the columnn group.
BEGIN
dbms_stats.drop_extended_stats(ownname   => 'SCOTT',
tabname   => 'EMP',
extension => '(JOB,DEPTNO)');
END;
/
PL/SQL procedure successfully completed.
Setting the METHOD_OPT parameter to "FOR ALL COLUMNS SIZE AUTO" allows the GATHER_% procedures to gather statistics on all existing column groups for the specified object.
BEGIN
DBMS_STATS.gather_table_stats(
'SCOTT',
'EMP',
method_opt => 'for all columns size auto');
END;
/
Alternatively, set the METHOD_OPT parameter to "FOR COLUMNS (column-list)" and the group will automatically be created during the statistics gathering.
BEGIN
DBMS_STATS.gather_table_stats(
'SCOTT',
'EMP',
method_opt => 'for columns (job,mgr)');
END;
/
The [DBA|ALL|USER]_STAT_EXTENSIONS views display information about the multi-column statistics.
COLUMN extension FORMAT A30
SELECT extension_name, extension
FROM   dba_stat_extensions
WHERE  table_name = 'EMP';
EXTENSION_NAME                 EXTENSION
------------------------------ ------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6 ("JOB","DEPTNO")
2 rows selected.
COLUMN col_group FORMAT A30
SELECT e.extension col_group,
t.num_distinct,
t.histogram
FROM   dba_stat_extensions e
JOIN dba_tab_col_statistics t ON e.extension_name=t.column_name
AND    t.table_name = 'EMP';
COL_GROUP                      NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
("JOB","DEPTNO")                          9 FREQUENCY
("JOB","MGR")                             8 FREQUENCY
2 rows selected.

Expression Statistics

In Oracle Database 10g, the optimizer can collect expression statistics on some types of expressions on columns, thus deriving more accurate selectivity estimates. This functionality applies only to certain special cases where a function preserves the data distribution characteristics of the original column, as is the case when you use an expression such as TO_NUMBER. In addition, the database in the previous release used dynamic sampling to get better estimates of built-in functions on columns. In Oracle Database 11g, the database uses expression statistics that include user-defined functions as well as function-based indexes. The new feature relies on the virtual column infrastructure to create expression statistics, that is, statistics on predicates involving expressions on columns.

The optimizer has no idea what the affect of applying a function to column has on the selectivity of the column. Using a similar method to multi-column statistics, we can gather expression statistics to provide more information. Expression statistics can be created explicitly using the CREATE_EXTENDED_STATS procedure, or implicitly by specifying the expression in the METHOD_OPT parameter of the GATHER_% procedures when gathering statistics.
DECLARE
l_cg_name VARCHAR2(30);
BEGIN
-- Explicitly created.
l_cg_name := DBMS_STATS.create_extended_stats(ownname   => 'SCOTT',
tabname   => 'EMP',
extension => '(LOWER(ENAME))');

-- Implicitly created.
DBMS_STATS.gather_table_stats(
'SCOTT',
'EMP',
method_opt => 'for columns (upper(ename))');
END;
/
Setting the METHOD_OPT parameter to "FOR ALL COLUMNS SIZE AUTO" allows the GATHER_% procedures to gather existing expression statistics.
BEGIN
DBMS_STATS.gather_table_stats(
'SCOTT',
'EMP',
method_opt => 'for all columns size auto');
END;
/
The [DBA|ALL|USER]_STAT_EXTENSIONS views display information about the expression statistics, as well as the multi-column statistics.
COLUMN extension FORMAT A30
SELECT extension_name, extension
FROM   dba_stat_extensions
WHERE  table_name = 'EMP';
EXTENSION_NAME                 EXTENSION
------------------------------ ------------------------------
SYS_STU3VG629OEYG6FN0EKTGV_HQ6 ("JOB","DEPTNO")
SYS_STU2JLSDWQAFJHQST7\$QK81_YB (LOWER("ENAME"))
SYS_STUOK75YSL165W#_X8GUYL0A1X (UPPER("ENAME"))
4 rows selected.
COLUMN col_group FORMAT A30
SELECT e.extension col_group,
t.num_distinct,
t.histogram
FROM   dba_stat_extensions e
JOIN dba_tab_col_statistics t ON e.extension_name=t.column_name
AND    t.table_name = 'EMP';
COL_GROUP                      NUM_DISTINCT HISTOGRAM
------------------------------ ------------ ---------------
("JOB","DEPTNO")                          9 NONE
("JOB","MGR")                             8 NONE
(LOWER("ENAME"))                         14 NONE
(UPPER("ENAME"))                         14 NONE
4 rows selected.
Expression statistics are dropped using the DROP_EXTENDED_STATS procedure.
-- Drop the columnn group.
BEGIN
dbms_stats.drop_extended_stats(ownname   => 'SCOTT',
tabname   => 'EMP',
extension => '(UPPER(ENAME))');
END;
/
PL/SQL procedure successfully completed.

Oracle 11g Extension Statistics and Expression Statistics(原创)

• 0

开心

• 0

板砖

• 0

感动

• 0

有用

• 0

疑问

• 0

难过

• 0

无聊

• 0

震惊