DOC

oracle10g new features-histograms and dbms_stats

By Jane Price,2014-06-30 09:39
10 views 0
oracle10g new features-histograms and dbms_stats

    New Features in Oracle10g

    White Paper on

    Histograms and DBMS_STATS

    Features in Oracle 10 g

    Submitted By

    Anupam Bansal

    July 2004

Oracle10g Features Histograms and DBMS_STATS Package

    1. Introduction

The purpose of this document is to explore the new enhancements in the following features of

    Oracle 10g:

    ? HISTOGRAMS

    ? DBMS_STATS

Histogram tells the optimizer how the data is distributed for a column. This information is used in

    determining the selectivity of the column for a given query and arriving at an optimal execution

    plan.Histograms are stored in dictionary and there is always space and maintenance cost for

    using histograms and therefore histograms should be used only for columns that have highly

    Skewed distribution .Histogram information can be found in following dictionary views in

    database like DBA_HISTOGRAMS,DBA_PART_HISTOGRAMS,DBA_SUBPPART_HISTOGRAMS

    DBMS_STATS package helps Oracle in gathering statistics. Through the new procedures available in it, we can lock/unlock statistics, restore old statistics, and purge the statistics. Oracle10g

    provides new arguments available for dbms_stats package subprograms .These parameters are

    as follows

    ? Granularity

    ? Degree

    DBMS_STATS package can be used to export and import the statistics in database as well restore the old statistics with the help of restore_table_stats procedure and dba_optstats_operations

    table . DBMS_STATS is a better method of collecting statistics than and analyze command or

    DBMS_UTILITY

Detailed information about these topics is provided below.

    Page 2 of 22 By: Anupam Bansal

Oracle10g Features Histograms and DBMS_STATS Package

    2. Histograms

    2.1. Overview

Histograms provide accurate estimation of distribution of data, and provide improved selectivity

    estimates in case of skewed data with non-uniform data distribution that helps the optimizer

    arrive at an optimal execution plan.

    Histograms can be used effectively only when:

    ? A table column is referenced in one or more queries, or

    ? Significant skewing exists in the distribution of a column's data values

    2.2. When to Use Histograms

Histograms are used to predict the cardinality and number of rows returned to query.

    Create Column histograms only when there are highly skewed values in it. Histograms affect

    performance and should be used only when required, for faster execution plan. They incur

    additional overhead during parsing of SQL query.

    Histograms are not useful for columns with the following characteristics:

    ? all predicates on the column use bind variables

    ? the column data is uniformly distributed

    ? the column is not used in WHERE clauses of queries

    ? the column is unique and is used only with equality predicates

    2.3. Buckets in Histograms

Histogram statistics are stored in the form of buckets. Buckets represent the partitioning of data

    values, depending on the range. Requirements for buckets for a column will depend on the

    occurrences of distinct values. The default number of buckets is appropriate, but one can

    experiment with various bucket sizes to find the most suitable size.

    DBMS_STATS package has the ability to look for columns that should have histograms and implement it with method_opt parameter. This parameter can have three values:

    1. Method_opt=>'for all columns size auto'

    2 Method_opt=>'for all columns size repeat'

    3 Method_opt=>'for all columns size skewonly'

    2.3.1. Auto Option

This option is used to automate the updating of statistics as tables are updated. This option can

    be used when monitoring is enabled, i.e., when STATISTICS_LEVEL is set to TYPICAL or

    ALL.. When this option is enabled for a table, Oracle monitors the DML changes (including

    truncates) being done on the table and maintains the details in the SGA. Every three hours (or

    after a shutdown), the SMON process incorporates the information collected in the SGA into the

    data dictionary. Histograms are automatically created based upon data distribution as determined

    by monitoring:

     Execute dbms_stats.gather_schema_stats(ownname =>

    'GPMOWNER',estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,method_opt

    => 'for all columns size auto ',degree => DBMS_STATS.DEFAULT_DEGREE);

    Page 3 of 22 By: Anupam Bansal

Oracle10g Features Histograms and DBMS_STATS Package

    2.3.2. Repeat Option

With REPEAT option histograms are collected only on the columns that already have histograms.

    In this option, you must refresh the histograms whenever data distribution changes.

Execute dbms_stats.gather_schema_stats(ownname =>

    'GPMOWNER',estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,method_opt => 'for all columns size repeat',degree => DBMS_STATS.DEFAULT_DEGREE);

    2.3.3. Skew Only option

This option helps in building the histograms on those column values in an index whose values are

    distributed unevenly.

     Execute dbms_stats.gather_schema_stats(ownname =>

    ‘GPMOWNER',estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,method_opt

    => 'for all columns size skewonly',

     Degree => DBMS_STATS.DEFAULT_DEGREE);

    2.4. Types of Histogram

Two types of histograms are available in Oracle10g:

    1. Frequency or Horizontal Histogram

     2 Height Or Vertical Histogram

    1 Frequency or Horizontal Histograms In this type of histogram, data is

    divided into fixed number of equal width ranges, and each bucket contains the number of

    occurrences of that single value.

     How to create a Frequency Histogram

    When the number of distinct values in a column is less than or equal to the number of

    buckets specified, e.g., in GPMOWNER schema MSG_QUEUEDMESSAGES table is created

    with data distribution as specified below:

    select status,count(*) from MSG_QUEUEDMESSAGES

    group by status;

     STATUS COUNT(*)

    1 2012

    2 75577

    3 14499

    4 1197

    5 207

    Since there is non-distribution of data on Status column in MSG_QUEUEDMESSAGES, a

    frequency histogram is created on Status column.

    Page 4 of 22 By: Anupam Bansal

Oracle10g Features Histograms and DBMS_STATS Package

    BEGIN

    DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'GPMOWNER',TABNAME=>

    'MSG_QUEUEDMESSAGES',METHOD_OPT=>'FOR COLUMNS SIZE 10

    STATUS');

    END;

    /

    This successfully completes the PL/SQL procedure.

    In method_opt, number of buckets specified is 10, which is more than the number of

    distinct values, therefore, it will automatically create a Frequency histogram:

    You can confirm that the created histogram is height-based, executing the following

    query:

    select column_name,num_distinct,num_buckets,histogram

    from user_tab_col_statistics where

    table_name='MSG_QUEUEDMESSAGES' AND COLUMN_NAME='STATUS';

    COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM

    ------------------------------ ------------ ----------- -----

    STATUS 5 5 FREQUENCY

    Histogram column in user_tab_col_statistics table is new in Oracle10g, and can have any

    of the two values - Height or Frequency.

    How data is distributed in buckets

     You can check data distribution in buckets by executing the following query :

    select endpoint_number,endpoint_value from

    user_histograms where table_name='MSG_QUEUEDMESSAGES' and

    column_name='STATUS' order by endpoint_number ;

     ENDPOINT_NUMBER ENDPOINT_VALUE

     --------- -------------------------

     234 1

     10271 2

     12269 3

     12438 4

     12463 5

     In user histograms

    endpoint_number - End point number

    endpoint_value - Normalized end point value for the buckets.

    Page 5 of 22 By: Anupam Bansal

    Oracle10g Features Histograms and DBMS_STATS Package

    In the above example if we consider difference between two endpoint_number such as

    12269 10271=1998 . This indicates 1998 values are represented in the bucket

    containing the endpoint 10271

An example is illustrated here which shows the behaviour of an explain plan before and

    after creating histogram on Status column of table MSG_QUEUEDMESSAGES, which is

    skewed.

    CASE-1 Before creating histogram on Status column

Table MSG_QUEUEDMESSAGES has the following indexes:

    ? MSG_QUEUEMESSAGES_NU1

    ? MSG_QUEUEMESSAGES_NU2

    MSG_QUEUEDMESSAGES_NU2 is on columns DESTINATION, STATUS

Explain plan before creating histogram:

    explain plan for SELECT /*+ INDEX("A1","MSG_QUEUEDMESSAGES_NU2")

    */ "A1".ROWID,

    "A1"."DESTINATION","A1"."QUEUE_SUB_TYPE","A1"."MESSAGE_TYPE",

    "A1"."KEY_PARAM_1","A1"."KEY_PARAM_2","A1"."KEY_PARAM_3","A1"."K

    EY_PARAM_4", "A1"."KEY_PARAM_5","A1"."QUEUE_SEQUENCE_NO" FROM

    "GPMOWNER"."MSG_QUEUEDMESSAGES" "A1" WHERE

    "A1"."DESTINATION"=RTRIM(:B0) AND "A1"."STATUS"=1 ORDER BY

    "A1"."QUEUE_SEQUENCE_NO"

    explained

    @xplan.sql

    Plan Table:

    | OPERATION | NAME

     | ROWS | BYTES| COST | PSTART| PSTOP |

    | SELECT STATEMENT |

     | 35 | 10K| 285 | | | | SORT ORDER BY |

     | 35 | 10K| 285 | | | | TABLE ACCESS BY INDEX ROWID

    |MSG_QUEUEDMESSAGES

     | 35 | 10K| 284 | | | | INDEX RANGE SCAN |MSG_QUEUEDMESSAGES

    _NU2 | 1 | | 50 | | | -----------------------------------------------------------------

    ---------------

    Page 6 of 22 By: Anupam Bansal

    Oracle10g Features Histograms and DBMS_STATS Package

    The above query is picking up index MSG_QUEUEDMESSAGES_NU2, which is based on two columns - Destination and Status. As Status column is skewed we created frequency

    histogram on Status column shown below.

    CASE-2 After creating Histogram on Status Column

Frequency histogram is created on Status column

    BEGIN

    DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'GPMOWNER',TABNAME

    =>'MSG_QUEUEDMESSAGES',METHOD_OPT=>'FOR COLUMNS SIZE 10

    STATUS');

    END;

    /

An Index MSG_QUEUEDMESSAGES_TMP is created on Status column

    Create index MSG_QUEUEDMESSAGES_TMP on MSG_QUEUEDMESSAGES(status) Tablespace I_GPM;

Analyze index MSG_QUEUEDMESSAGES_TMP compute statistics;

explain plan for SELECT "A1".ROWID,

    "A1"."DESTINATION","A1"."QUEUE_SUB_TYPE","A1"."MESSAGE_TYPE",

    A1"."KEY_PARAM_1","A1"."KEY_PARAM_2","A1"."KEY_PARAM_3","A1"."KE

    Y_PARAM_4", "A1"."KEY_PARAM_5","A1"."QUEUE_SEQUENCE_NO" FROM

    "GPMOWNER"."MSG_QUEUEDMESSAGES" "A1" WHERE

    “A1"."DESTINATION"=16040000000 AND "A1"."STATUS"=1 ORDER BY

    "A1"."QUEUE_SEQUENCE_NO";

explained

@xplan.sql

Plan Table

    | OPERATION |

    NAME

     | ROWS | BYTES| COST | PSTART| PSTOP |

    --------------------------------------------------------------

    ------------------

    | SELECT STATEMENT |

     | 1K| 73K| 46 | | |

    | SORT ORDER BY |

     | 1K| 73K| 46 | | |

    | TABLE ACCESS BY INDEX ROWID

    |MSG_QUEUEDMESSAGES

     | 1K| 73K| 45 | | |

    | INDEX RANGE SCAN

    |MSG_QUEUEDMESSAGES

    _TMP | 2K| | 5 | | |

    -----------------------------------------------------------------------

    Page 7 of 22 By: Anupam Bansal

    Oracle10g Features Histograms and DBMS_STATS Package

From the above example, we can see that the explain plan has changed in the way it is

    picking the index now.The MSG_QUEUEDMESSAGES_TMP index is created on Status column, which is skewed .In the previous explain plan it is picking the index

    MSG_QUEUEDMESSAGES_NU2, which is a composite index based on two columns -

    Status and Destination. From analysis of the above query, performance gains are better

    if the query is able to pick the index on Status column, which is possible by making the

    histogram on Status column.

Height Based Or Vertical Histograms In this type of Histogram approximately same

    number of values fall into same range (means same height) so that endpoint of the

    range are determined by how many values are in that range .Endpoints for the buckets

    would be determined by the density of Distinct values in the column

    How Height-Based Histogram is created When number of distinct values in column is greater than number of buckets specified

    BEGIN

    DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>'GPMOWNER',TABNAME=>

    'MSG_QUEUEDMESSAGES',METHOD_OPT=>'FOR COLUMNS SIZE 4

    STATUS');

    END;

    /

You can confirm that the created histogram is height-based, executing the following

    query:

    select column_name,num_distinct,num_buckets,histogram

    from user_tab_col_statistics where

    table_name='MSG_QUEUEDMESSAGES' AND COLUMN_NAME='STATUS';

    COLUMN_NAME NUM_DISTINCT NUM_BUCKETS HISTOGRAM

    STATUS 5 4 Height Balanced

     How data is distributed in buckets

    You can check data distribution in buckets by executing the following query :

select endpoint_number,endpoint_value from

    user_histograms where table_name='MSG_QUEUEDMESSAGES' and

    column_name='STATUS' order by endpoint_number ;

    ENDPOINT_NUMBER ENDPOINT_VALUE

    --------------- ----------------------------------------------------

     0 1

     3 2

     4 5

    Page 8 of 22 By: Anupam Bansal

    Oracle10g Features Histograms and DBMS_STATS Package

Oracle creates requested number of buckets but put the same number of values in each

    bucket ENDPOINT_NUMBER is the actual bucket number and ENDPOINT_VALUE is

    endpoint_value of bucket determined by column_value

1---------2----------2-----------2-----------5

Distribution of data will be as shown above

Bucket 0 has low value of 1 , Bucket 1 has endpoint 2, Bucket 2 has endpoint 2,Bucket3

    has endpoint 2,Bucket4 has endpoint 5 .

There are about 92,000 records in MSG_QUEUEDMESSAGES out of which status column

    which has value2 has 75,000 records .Each bucket has near about 23,000 records

Now if we assume data is uniformly distributed

For Bucket1Value of 2 is 50%

    For Bucket2Value of 2 is 100%

    For Bucket3Value of 2 is 100%

    For Bucket4---Value of 2 is 0%

    Overall value of 2 is 12000+23000+23000+0

Near about 60,000 records

Therefore we can see creating height based histogram has increased the selectivity of a

    column Generally it is preferred to create height based histograms on columns that have

    may distinct values compared to number of buckets and whose column data is skewed .

One more example to illustrate the advantage of height based histograms is explained

    below

     In GPMOWNER schema table TEMP_TEST is created

create table temp_test (id number(5),name varchar2(10),address varchar2(30),age

    number(3),remarks varchar2(40)) tablespace t_gpm;

In temp_test table 80,000 records are inserted in the way that for id values 16 records

    are near about 60,000 and for remaining values of ids which have 99 distinct values

    records are uniformly distributed i. Near about 200

    Page 9 of 22 By: Anupam Bansal

    Oracle10g Features Histograms and DBMS_STATS Package

    Declare

    Temp_count number(6) ; i number;j number;k number; n number := 0;

    begin

    temp_count := 0 ;

    for I in 1..99

    loop

     if I <> 16 THEN

     if mod(I,2) = 0 THEN

     n := 203;

     else

     n := 204;

     end if;

     IF I = 99 THEN

     n := 500;

     END IF;

     For j in 1..n

     Loop

    If temp_count = 80000 THEN

     Dbms_output.put_line('value of i--'||i||' n--'||n||'

    temp_count--'||temp_count);

     EXIT;

     END IF;

     Insert into temp_test

     Values

     ( I ,

     'abc'||i||j ,

     'abc'||i||(j-1) ,

     j,

     'Def'||I||(j+34));

     temp_count := temp_count + 1;

     end loop;

     commit;

    else

     for k in i..60000

     loop

     --dbms_output.put_line('i--'||i);

     insert into temp_test

     values

     ( I ,

     'abc'||i||k ,

     'abc'||i||(k-1) ,

     k/100,

     'Def'||i||(k+34)); temp_count := temp_count + 1;

     end loop;

     commit;

     end if;

    end loop;

    end;

    /

80,000 records are inserted through this procedure

    Page 10 of 22 By: Anupam Bansal

Report this document

For any questions or suggestions please email
cust-service@docsford.com