Oracle9i Database Performance Guide and Reference Release 1 (9.0.1) Part Number A87503-02 |
|
This chapter discusses SQL processing, optimization methods, and how the optimizer chooses to execute SQL statements.
This chapter contains the following sections:
The SQL processing architecture contains the following main components:
Figure 1-1 illustrates the SQL processing architecture:
The parser, the optimizer, and the row source generator form the SQL Compiler. This compiles the SQL statements into a shared cursor. Associated with the shared cursor is the execution plan.
The parser performs two functions:
The optimizer uses internal rules or costing methods to determine the most efficient way of producing the result of the query. The output from the optimizer is a plan that describes an optimum method of execution. The Oracle server provides two methods of optimization: cost-based optimizer (CBO) and rule-based optimizer (RBO).
The row source generator receives the optimal plan from the optimizer. It outputs the execution plan for the SQL statement. The execution plan is a collection of row sources structured in the form of a tree. Each row source returns a set of rows for that step.
SQL execution is the component that operates on the execution plan associated with a SQL statement. It then produces the results of the query. Each row source produced by the row source generator is executed by the SQL execution engine.
The optimizer determines the most efficient way to execute a SQL statement. This is an important step in the processing of any SQL statement. Often, there are many different ways to execute a SQL statement; for example, by varying the order in which tables or indexes are accessed. The procedure Oracle uses to execute a statement can greatly affect how quickly the statement executes.
The optimizer considers many factors related to the objects referenced and the conditions specified in the query. It can use either a cost-based or a rule-based approach.
You can influence the optimizer's choices by setting the optimizer approach and goal, and by gathering representative statistics for the CBO. Sometimes, the application designer, who has more information about a particular application's data than is available to the optimizer, can choose a more effective way to execute a SQL statement. The application designer can use hints in SQL statements to specify how the statement should be executed.
See Also:
|
For any SQL statement processed by Oracle, the optimizer does the following:
1 |
Evaluation of expressions and conditions |
The optimizer first evaluates expressions and conditions containing constants as fully as possible. (See "How the Optimizer Performs Operations".) |
2 |
Statement transformation |
For complex statements involving, for example, correlated subqueries or views, the optimizer might transform the original statement into an equivalent join statement. (See "How the Optimizer Transforms SQL Statements".) |
3 |
Choice of optimizer approaches |
The optimizer chooses either a cost-based or rule-based approach and determines the goal of optimization. (See "Understanding Joins".) |
4 |
Choice of access paths |
For each table accessed by the statement, the optimizer chooses one or more of the available access paths to obtain the table's data. (See "Understanding Access Paths for the CBO".) |
5 |
Choice of join orders |
For a join statement that joins more than two tables, the optimizer chooses which pair of tables is joined first, and then which table is joined to the result, and so on. |
6 |
Choice of join methods |
For any join statement, the optimizer chooses an operation to use to perform the join. |
To execute a DML statement, Oracle might need to perform many steps. Each of these steps either retrieves rows of data physically from the database or prepares them in some way for the user issuing the statement. The combination of the steps Oracle uses to execute a statement is called an execution plan. An execution plan includes an access path for each table that the statement accesses and an ordering of the tables (the join order) with the appropriate join method.
You can examine the execution plan chosen by the optimizer for a SQL statement by using the EXPLAIN
PLAN
statement. This causes the optimizer to choose the execution plan and then insert data describing the plan into a database table. Simply issue the EXPLAIN
PLAN
statement and then query the output table.
The following explains a SQL statement that selects the name, job, salary, and department name for all employees whose salaries do not fall into a recommended salary range:
EXPLAIN PLAN FOR SELECT ename, job, sal, dname FROM emp, dept WHERE emp.deptno = dept.deptno AND NOT EXISTS(SELECT * FROM salgrade WHERE emp.sal BETWEEN losal AND hisal);
The following output table describes the statement explained in the previous section:
ID OPERATION OPTIONS OBJECT_NAME ------------------------------------------------------------ 0 SELECT STATEMENT 1 FILTER 2 NESTED LOOPS 3 TABLE ACCESS FULL EMP 4 TABLE ACCESS BY ROWID DEPT 5 INDEX UNIQUE SCAN PK_DEPTNO 6 TABLE ACCESS FULL SALGRADE
Figure 1-2 shows a graphical representation of the execution plan for this SQL statement.
Each box in Figure 1-2 and each row in the output table corresponds to a single step in the execution plan. For each row in the listing, the value in the ID column is the value shown in the corresponding box in Figure 1-2.
Each step of the execution plan returns a set of rows that either are used by the next step or, in the last step, are returned to the user or application issuing the SQL statement. A set of rows returned by a step is called a row set.
Figure 1-2 is a hierarchical diagram showing the flow of row sources from one step to another. The numbering of the steps reflects the order in which they are displayed in response to the EXPLAIN
PLAN
statement. Generally, this is not the order in which the steps are executed.
Each step of the execution plan either retrieves rows from the database or accepts rows from one or more row sources as input:
emp
and salgrade
tables, respectively.
deptno
value in the pk_deptno
index returned by step 3. There it finds the rowids of the associated rows in the dept
table.
dept
table.
See Also:
|
The steps of the execution plan are not performed in the order in which they are numbered. Rather, Oracle first performs the steps that appear as leaf nodes in the tree-structured graphical representation of the execution plan (steps 3, 5, and 6 in Figure 1-2). The rows returned by each step become the row sources of its parent step. Then, Oracle performs the parent steps.
For example, Oracle performs the following steps to execute the statement in Figure 1-2:
Oracle performs steps 5, 4, 2, 6, and 1 once for each row returned by step 3. If a parent step requires only a single row from its child step before it can be executed, then Oracle performs the parent step as soon as a single row has been returned from the child step. If the parent of that parent step also can be activated by the return of a single row, then it is executed as well.
Thus, the execution can cascade up the tree, possibly to encompass the rest of the execution plan. Oracle performs the parent step and all cascaded steps once for each row in turn retrieved by the child step. The parent steps that are triggered for each row returned by a child step include table accesses, index accesses, nested loops joins, and filters.
If a parent step requires all rows from its child step before it can be executed, then Oracle cannot perform the parent step until all rows have been returned from the child step. Such parent steps include sorts, sort merge joins, and aggregate functions.
By default, the goal of the CBO is the best throughput. This means that it chooses the least amount of resources necessary to process all rows accessed by the statement.
Also, Oracle can optimize a statement with the goal of best response time. This means that it uses the least amount of resources necessary to process the first row accessed by a SQL statement.
The execution plan produced by the optimizer can vary depending on the optimizer's goal. Optimizing for best throughput is more likely to result in a full table scan rather than an index scan, or a sort merge join rather than a nested loops join. Optimizing for best response time, however, more likely results in an index scan or a nested loops join.
For example, suppose you have a join statement that is executable with either a nested loops operation or a sort-merge operation. The sort-merge operation might return the entire query result faster, while the nested loops operation might return the first row faster. If your goal is to improve throughput, then the optimizer is more likely to choose a sort merge join. If your goal is to improve response time, then the optimizer is more likely to choose a nested loops join.
Choose a goal for the optimizer based on the needs of your application:
The optimizer's behavior when choosing an optimization approach and goal for a SQL statement is affected by the following factors:
The OPTIMIZER_MODE
initialization parameter establishes the default behavior for choosing an optimization approach for the instance. It can have the following values:
For example: The following statement changes the goal of the CBO for the session to best response time:
ALTER SESSION SET OPTIMIZER_MODE = FIRST_ROWS_10;
If the optimizer uses the cost-based approach for a SQL statement, and if some tables accessed by the statement have no statistics, then the optimizer uses internal information (such as the number of data blocks allocated to these tables) to estimate other statistics for these tables.
The OPTIMIZER_GOAL
parameter of the ALTER
SESSION
statement can override the optimizer approach and goal established by the OPTIMIZER_MODE
initialization parameter for an individual session.
The value of this parameter affects the optimization of SQL statements issued by the user, including those issued by stored procedures and functions called during the session. It does not affect the optimization of internal recursive SQL statements that Oracle issues during the session for operations such as space management and data dictionary operations.
The OPTIMIZER_GOAL
parameter can have the same values as the OPTIMIZER_MODE
initialization parameter
A FIRST_ROWS(
n), FIRST_ROWS
, ALL_ROWS
, CHOOSE
, or RULE
hint in an individual SQL statement can override the effects of both the OPTIMIZER_MODE
initialization parameter and the OPTIMIZER_GOAL
parameter of the ALTER
SESSION
statement.
By default, the cost-based approach optimizes for best throughput. You can change the goal of the CBO in the following ways:
ALTER
SESSION
SET
OPTIMIZER_MODE
statement with the ALL_ROWS
, FIRST_ROWS
, or FIRST_ROWS_
n (where n = 1, 10, 100, or 1000) clause.
ALL_ROWS
, FIRST_ROWS(
n) (where n = any positive integer), or FIRST_ROWS
hint.
The statistics used by the CBO are stored in the data dictionary. You can collect exact or estimated statistics about physical storage characteristics and data distribution in these schema objects by using the DBMS_STATS
package or the ANALYZE
statement.
To maintain the effectiveness of the CBO, you must have statistics that are representative of the data. It is possible to gather statistics on objects using either of the following:
DBMS_STATS
package.
ANALYZE
statement.
For table columns that contain skewed data (in other words, values with large variations in number of duplicates), you should collect histograms.
The resulting statistics provide the CBO with information about data uniqueness and distribution. Using this information, the CBO is able to compute plan costs with a high degree of accuracy. This enables the CBO to choose the best execution plan based on the least cost.
See Also:
Chapter 3, "Gathering Optimizer Statistics" for detailed information on gathering statistics |
The CBO can optimize a SQL statement either for throughput or for fast response. Fast response optimization is used when the parameter OPTIMIZER_MODE
is set to FIRST_ROWS_
n (where n is 1, 10, 100, or 1000) or FIRST_ROWS
. A hint FIRST_ROWS
(n) (where n is any positive integer) or FIRST_ROWS
can be used to optimize an individual SQL statement for fast response. Fast response optimization is suitable for online users, such as those using Oracle Forms or Web access. Typically, online users are interested in seeing the first few rows, and they seldom are interested in seeing the entire query result, especially when the result size is large. For such users, it makes sense to optimize the query to produce the first few rows as fast as possible, even if the time to produce the entire query result is not minimized.
With fast response optimization, the CBO generates a plan with lowest cost to produce the first row or the first few rows. The CBO employs two different fast response optimizations. The old method is used with the FIRST_ROWS
hint or parameter value. With the old method, the CBO uses a mixture of costs and rules to produce a plan. It is retained for backward compatibility reasons.
The new fast response optimization method is used when the FIRST_ROWS_
n (where n can be 1, 10, 100, or 1000) parameter value is used or when the FIRST_ROWS
(n) (where n can be any positive integer) hint is used. The new method is totally based on costs, and it is sensitive to the value of n. With small values of n, the CBO tends to generate plans that consist of nested loops joins with index lookups. With large values of n, the CBO tends to generate plans that consist of hash joins and full table scans.
The value of n should be chosen based on the online user requirement. It depends on how the result is displayed to the user. Generally, Oracle Forms users see the result one row at a time, and they typically are interested in seeing first few screens. Other online users see the result one group of rows at a time.
With the fast response method, the CBO explores different plans and, for each, computes the cost to produce the first n rows. It picks the plan that produces the first N rows with the lowest cost. Remember that with fast response optimization, a plan that produces the first n rows with the lowest cost might not be the optimal plan to produce the entire result. If the requirement is to obtain the entire result of a query, then fast response optimization should not be used. Instead use the ALL_ROWS
parameter value or hint.
The use of any of the following features requires the use of the CBO:
SAMPLE
clauses in a SELECT
statement
Note:
Even if the parameter |
In general, use the cost-based approach. Oracle Corporation is continually improving the CBO, and new features work only with the CBO. The rule-based approach is available for backward compatibility with legacy applications.
The CBO determines which execution plan is most efficient by considering available access paths and by factoring in information based on statistics for the schema objects (tables or indexes) accessed by the SQL statement. The CBO also considers hints, which are optimization suggestions placed in a comment in the statement.
The CBO performs the following steps:
The cost is an estimated value proportional to the expected resource use needed to execute the statement with a particular plan. The optimizer calculates the cost of access paths and join orders based on the estimated computer resources, including I/O, CPU, and memory.
Serial plans with higher costs take more time to execute than those with smaller costs. When using a parallel plan, however, resource use is not directly related to elapsed time.
The CBO consists of the following three main components:
The CBO architecture is illustrated in Figure 1-3.
The input to query transformer is a parsed query, which is represented by a set of query blocks. The query blocks are nested or interrelated to each other. The form of the query determines how the query blocks are interrelated to each other. The main objective of the query transformer is to determine if it is advantageous to change the form of the query so that it enables generation of a better query plan. Four different query transformation techniques are employed by the query transformer: view merging, predicate pushing, subquery unnesting, and query rewrite using materialized views. Any combination of these transformations might be applied to a given query.
Each view referenced in a query is expanded by the parser into a separate query block. The query block essentially represents the view definition, and therefore the result of a view. One option for the optimizer is to optimize the view query block separately and generate a subplan. Then, optimize the rest of the query by using the view subplan in the generation of overall query plan. Doing so usually leads to a suboptimal query plan, because the view is optimized separately from rest of the query.
The query transformer removes the potential suboptimal plan by merging the view query block into the query block that contains the view. Most types of views are merged. When a view is merged, the query block representing the view is merged into the containing query block. It is no longer necessary to generate a subplan, because view query block is eliminated.
For those views that are not merged, the query transformer can push the relevant predicates from the containing query block into the view query block. Doing so improves the subplan of the nonmerged view, because the pushed-in predicates can be used either to access indexes or can act as filters.
Like a view, a subquery is represented by a separate query block. Because a subquery is nested within the main query or another subquery, this constrains the plan generator in trying out different possible plans before it finds a plan with the lowest cost. For this reason, the query plan produced might not be the optimal one. The restrictions due to the nesting of subqueries can be removed by unnesting the subqueries and converting them into joins. Most subqueries are unnested by the query transformer. For those subqueries that are not unnested, separate subplans are generated. To improve the execution speed of the overall query plan, the subplans are ordered in an efficient manner.
A materialized view is like a query whose result is materialized and stored in a table. When a user query is found compatible with the query associated with a materialized view, the user query can be rewritten in terms of the materialized view. Doing so improves the execution of the user query, because most of the query result has been precomputed. The query transformer looks for any materialized views that are compatible with the user query, and selects one or more materialized views to rewrite the user query. The use of materialized views to rewrite a query is cost-based. That is, the query is not rewritten if the plan generated without the materialized views has lower cost than the plan generated with the materialized views.
See Also:
|
The estimator generates three different types of measures: selectivity, cardinality, and cost. These measures are related to each other, and one is derived from another. The end goal of the estimator is to estimate the overall cost of a given plan. If statistics are available, then the estimator uses them to compute the measures. The statistics improve the degree of accuracy of the measures.
The first type of measure is the selectivity, which represents a fraction of rows from a row set. The row set can be a base table, a view, or the result of a join or a GROUP
BY
operator. The selectivity is tied to a query predicate, such as last_name
= 'Smith
', or a combination of predicates, such as last_name
= 'Smith
' AND
job_type
= 'Clerk
'. A predicate acts as a filter that filters certain number of rows from a row set. Therefore, the selectivity of a predicate indicates how many rows from a row set will pass the predicate test. The selectivity lies in the value range 0.0 to 1.0. A selectivity of 0.0 means that no rows will be selected from a row set, and a selectivity of 1.0 means that all rows will be selected.
The estimator uses an internal default value for the selectivity if no statistics are available. Different internal defaults are used depending on the predicate type. For example, the internal default for an equality predicate (last_name
= 'Smith
') is lower than the internal default for a range predicate (last_name
> 'Smith
'). The estimator makes this assumption because an equality predicate is expected to return a smaller fraction of rows than a range predicate.
When statistics are available, the estimator estimates selectivity based on statistics. For example, for an equality predicate (last_name
= 'Smith
') the selectivity is set to the reciprocal of the number of distinct values of last_name
, because the query selects rows that all contain one out of N distinct values. If a histogram is available on the last_name
column, then the estimator uses it instead of the number of distinct values statistic. The histogram captures the distribution of different values in a column, so its use yields better selectivity estimates. Therefore, having histograms on columns that contain skewed data (in other words, values with large variations in number of duplicates) greatly helps the CBO generate good selectivity estimates.
Cardinality represents the number of rows in a row set. Here, the row set can be a base table, a view, or the result of a join or GROUP
BY
operator. The base cardinality is the number of rows in a base table. The base cardinality can be captured by analyzing the table. If table statistics are not available, then the estimator uses the number of extents occupied by the table to estimate the base cardinality.
The effective cardinality is the number of rows that will be selected from a base table. The effective cardinality is dependent on the predicates specified on different columns of a base table. This is because each predicate acts as a successive filter on the rows of a base table. The effective cardinality is computed as the product of base cardinality and combined selectivity of all predicates specified on a table. When there is no predicate on a table, its effective cardinality equals its base cardinality.
The join cardinality is the number of rows produced when two row sets are joined together. A join is a Cartesian product of two row sets with the join predicate applied as a filter to the result. Therefore, the join cardinality is the product of the cardinalities of two row sets, multiplied by the selectivity of the join predicate.
A distinct cardinality is the number of distinct values in a column of a row set. The distinct cardinality of a row set is based on the data in the column. For example, in a row set of 100 rows, if distinct column values are found in 20 rows, then the distinct cardinality is 20.
The group cardinality is the number of rows produced from a row set after the GROUP
BY
operator is applied. The effect of the GROUP
BY
operator is to decrease the number of rows in a row set. The group cardinality depends on the distinct cardinality of each of the grouping columns and the number of rows in the row set.
For example, if a row set of 100 rows is grouped by colx, whose distinct cardinality is 30, then the group cardinality is 30.
If the same row set of 100 rows is grouped by colx and coly, and the distinct cardinalities of colx and coly are 30 and 60 respectively, then the group cardinality lies between the maximum of the distinct cardinalities of colx and coly, and the lower of the product of the distinct cardinalities of colx and coly, and the number of rows in the row set.
This can be represented by the following formula:
group cardinality lies between max ( dist. card. colx , dist. card. coly ) and min ( (dist. card. colx * dist. card. coly) , num rows in row set )
Substituting the numbers from the example, the group cardinality is between the maximum of (30 and 60) and the minimum of (30*60 and 100). In other words, the group cardinality is between 60 and 100.
The cost represents units of work or resource used. The CBO uses disk I/O, CPU usage, and memory usage as units of work. So, the cost used by the CBO represents an estimate of the number of disk I/Os and the amount of CPU and memory used in performing an operation. The operation can be scanning a table, accessing rows from a table using an index, joining two tables together, or sorting a row set. The cost of a query plan is the number of work units that are expected to be incurred when the query is executed and its result produced.
The access path represents the number of units of work done in accessing data from a base table. The access path can be a table scan, a fast full index scan, or an index scan. During table scan or fast full index scan, multiple blocks are read from the disk in a single I/O operation. Therefore, the cost of a table scan or a fast full index scan depends on the number of blocks to scan and the multiblock read count value. The cost for an index scan depends on the levels in the B-tree, the number of index leaf blocks to scan, and the number of rows to fetch using the rowid in the index keys. The cost to fetch rows using rowids depends on the index clustering factor.
Although the clustering factor is a property of the index, the clustering factor actually relates to the spread of similar indexed column values within data blocks in the table. A lower clustering factor indicates that the individual rows are concentrated within fewer blocks in the table. Conversely, a high clustering factor indicates that the individual rows are scattered more randomly across blocks in the table. Therefore, a high clustering factor means that it costs more to fetch rows by rowid using a range scan, because more blocks in the table need to be visited to return the data.
For example, assume the following:
col1
(in tab1
).
c1
column currently stores the values A
, B
and C
.
Example 1: The index clustering factor is low for the rows as they are arranged in the diagram below. This is because the rows that have the same indexed column values for c1
are co-located within the same physical blocks in the table. This means that the cost of returning all of the rows that have the value A
via a range scan is low, because only one block in the table needs to be read.
Block 1 Block 2 Block 3 ------- ------- -------- A A A B B B C C C
Example 2: If the same rows in the table are rearranged so that the index values are scattered across the table blocks (rather than colocated), then the index clustering factor is higher, because in order to retrieve all rows with the value A
in col1
, all three blocks in the table must be read.
Block 1 Block 2 Block 3 ------- ------- -------- A B C A B C A B C
The join cost represents the combination of the individual access costs of the two row sets being joined. In a join, one row set is called inner, and the other is called outer. In a nested loops join, for every row in the outer row set, the inner row set is accessed to find all matching rows to join. Therefore, in a nested loops join, the inner row set is accessed as many times as the number of rows in the outer row set. The cost of nested loops join = outer access cost + (inner access cost * outer cardinality).
In sort merge join, the two row sets being joined are sorted by the join keys if they are not already in key order. The cost of sort merge join = outer access cost + inner access cost + sort costs (if sort used).
In hash join, the inner row set is hashed into memory, and a hash table is built using the join key. Then, each row from the outer row set is hashed, and the hash table is probed to join all matching rows. If the inner row set is very large, then only a portion of it is hashed into memory. This is called a hash partition.
Each row from the outer row set is hashed to probe matching rows in the hash partition. After this, the next portion of the inner row set is hashed into memory, followed by a probe from the outer row set. This process is repeated until all partitions of the inner row set are exhausted. The cost of hash join = (outer access cost * # of hash partitions) + inner access cost.
The main function of the plan generator is to try out different possible plans for a given query and pick the one that has the lowest cost. Many different plans are possible because of the various combinations of different access paths, join methods, and join orders that can be used to access and process data in different ways and produce the same result.
A join order is the order in which different join items (such as tables) are accessed and joined together. For example, in a join order of t1
, t2
, and t3
, table t1
is accessed first. This is followed by access of t2
, whose data is joined to t1
data to produce a join of t1
and t2
. Finally, t3
is accessed, and its data is joined to the result of join between t1
and t2
.
The plan for a query is established by first generating subplans for each of the nested subqueries and nonmerged views. Each nested subquery or nonmerged view is represented by a separate query block. The query blocks are optimized separately in a bottom-up order. That is, the innermost query block is optimized first, and a subplan is generated for it. The outermost query block, which represents the entire query, is optimized last.
The plan generator explores different plans for a query block by trying out different access paths, join methods, and join orders. The number of possible plans for a query block is proportional to the number of join items in the FROM
clause. This number rises exponentially with the number of join items.
Because of this reason, the plan generator uses an internal cutoff to reduce the number of plans it tries when finding the one with the lowest cost. The cutoff is based on the cost of the current best plan. If current best cost is large, then the plan generator tries harder (in other words, explores more alternate plans) to find a better plan with lower cost. If current best cost is small, then the plan generator ends the search swiftly, because further cost improvement will not be significant.
The cutoff works very well if the plan generator starts with an initial join order that produces a plan with cost close to optimal. Finding a good initial join order is a difficult problem. The plan generator uses a simple heuristic for the initial join order. It orders the join items by their effective cardinalities. The join item with the smallest effective cardinality goes first, and the join item with the largest effective cardinality goes last.
Access paths are ways in which data is retrieved from the database. For any row in any table accessed by a SQL statement, there are three common ways by which that row can be located and retrieved:
ROWID
. ROWID
access is the fastest way to retrieve a row, because this specifies the exact location of the row in the database.
In general, index access paths should be used for statements that retrieve a small subset of the table's rows, while full scans are more efficient when accessing a large portion of the table. OLTP systems, which consist of short-running SQL statements with high selectivity, often are characterized by the use of index access paths. Decision support systems, on the other hand, often use partitioned tables and perform full scans of the relevant partitions.
See Also:
"Understanding Access Paths for the RBO" for the a list of the access paths that are available for the RBO, as well as their ranking |
This section describes the following data access paths:
During a full table scan, all blocks in the table below the high water mark are scanned. Each row is examined to determine whether it satisfies the statement's WHERE
clause.
When Oracle performs a full tables scan, the blocks are read sequentially. Because the blocks are adjacent, I/O calls larger than a single block can be used to speed up the process. The size of the read calls range from one block to the number of blocks indicated by the initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT
. Using multiblock reads means a full table scan can be performed very efficiently. Each block is read only once.
For example:
SELECT last_name, first_name FROM per_people_f WHERE UPPER(full_name) LIKE :b1 Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS FULL PER_ALL_PEOPLE_F
The above example does a case-independent search for a name on a table containing all employees. Although the table might have several thousand employees, the number of rows for a given name ranges from 1 to 20. So, it might be better to use an index to access the desired rows. There is an index on the full_name
in per_people_f_n54
.
However, the optimizer is unable to use the index, because there is a function on the indexed column. Because it does not have any other access path, it uses a full table scan. If you need to use the index for case-independent searches, then function-based indexes created on search columns or mixed case data should not be allowed in the search columns.
The optimizer uses a full table scan if there is any of the following:
If the query is unable to use any existing indexes, then it uses a full table scan.
If the optimizer thinks that the query will access a fraction of blocks in the table, then it uses a full table scan, even though there might be indexes available.
If a table contains less than DB_FILE_MULTIBLOCK_READ_COUNT
blocks below the high water mark, which can be read in a single I/O call, then it could be cheaper to do a full table scan rather than an index range scan, regardless of the fraction of tables being accessed or indexes present.
If the table has not been analyzed since it was created, and if it has less than DB_FILE_MULTIBLOCK_READ_COUNT
blocks below the high water mark, then the optimizer thinks that the table is small and uses a full table scan. Look at the LAST_ANALYZED
and BLOCKS
columns in the ALL_TABLES
table to see what the statistics reflect.
A high degree of parallelism for the table skews the optimizer towards full table scans over range scans. Examine the DEGREE
column in ALL_TABLES
for the table to determine the degree of parallelism.
Use the hint FULL
(table alias).
Example before using the hint:
SELECT last_name, first_name FROM per_people_f WHERE full_name LIKE :b1; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY INDEX ROWID PER_PEOPLE_F INDEX RANGE SCAN PER_PEOPLE_F_N54
Example after using the hint:
SELECT /*+ FULL(f) */ last_name, first_name FROM per_people_f f WHERE full_name LIKE :b1;
Plan
-------------------------------------------------
SELECT STATEMENT
TABLE ACCESS FULL PER_PEOPLE_F
Full table scans are cheaper than index range scans when accessing a large fraction of the blocks in the table. This is because full table scans can use larger I/O calls, and fewer large I/O calls are cheaper than many small I/O calls.
The elapsed time for an I/O call consists of two components:
In a typical I/O operation, setup costs consume most of the time. Data transfer time for an eight K buffer is less than one ms (out of the total time of ten ms). This means that you can transfer 128KB in about 20 ms with a single 128 KB call, opposed to 160 ms with 16 eight KB calls.
In the following example, 20% of the blocks in a 10,000 block table are accessed. The following are true:
DB_FILE_MULTIBLOCK_READ_COUNT
= 16
DB_BLOCK_SIZE
=8k
Assume that each 8K I/O takes 10 ms, and about 20 seconds are spent doing single block I/O for the table blocks. This does not include the additional time to perform the index block I/O. Assuming that each 128K I/O takes 20 ms, about 12.5 seconds are spent waiting for the data blocks, with no wait required for any index blocks.
The total time changes when CPU numbers are added in for crunching all the rows for a full table scan and for processing the index access in the range scan. But, the full table scan comes out faster with such a large fraction of blocks being accessed.
Oracle does I/O by blocks. Therefore, the optimizer's decision to use full table scans is influenced by the percentage of blocks accessed, not rows. This is the index clustering factor. If blocks contain single rows, then rows accessed and blocks accessed are the same.
However, most tables have multiple rows in each block. So, the desired number of rows could be clustered together in a few blocks, or they could be spread out over a larger number of blocks.
Consider a case where each block has 100 rows, but on an average only five rows per block meet the query condition. A query accessing 2% of the rows might need 40% of the blocks.
The data dictionary keeps track of the blocks that have been populated with rows. The high water mark is used as the end marker during a full table scan. The high water mark is stored in DBA_TABLES
.BLOCKS
. It is reset when the table is dropped or truncated.
For example, consider a table that had a large number of rows in the past. Most of the rows have been deleted, and now most of the blocks under the high water mark are empty. A full table scan on this table exhibits poor performance, because all the blocks below the high water mark are scanned.
When a full table scan is required, response time can be improved by using parallel slaves for scanning the table. Parallel query is used generally in low concurrency data warehousing environments due to the potential resource usage.
A sample table scan retrieves a random sample of data from a table. This access path is used when the statement's FROM
clause includes the SAMPLE
clause or the SAMPLE
BLOCK
clause. To perform a sample table scan when sampling by rows (the SAMPLE
clause), Oracle reads a specified percentage of rows in the table. To perform a sample table scan when sampling by blocks (the SAMPLE
BLOCK
clause), Oracle reads a specified percentage of the table's blocks.
Oracle does not support sample table scans when the query involves a join or a remote table. However, you can perform an equivalent operation by using a CREATE
TABLE
AS
SELECT
query to materialize a sample of an underlying table and then rewrite the original query to refer to the newly created table sample. Additional queries can be written to materialize samples for other tables. Sample table scans require the CBO.
For example:
The following statement uses a sample table scan to access 1% of the emp
table, sampling by blocks:
SELECT * FROM emp SAMPLE BLOCK (1);
The EXPLAIN
PLAN
output for this statement might look like this:
OPERATION OPTIONS OBJECT_NAME ----------------------------------------------------- SELECT STATEMENT TABLE ACCESS SAMPLE EMP
The rowid of a row specifies the datafile and data block containing the row and the location of the row in that block. Locating a row by its rowid is the fastest way for Oracle to find a single row.
To access a table by rowid, Oracle first obtains the rowids of the selected rows, either from the statement's WHERE
clause or through an index scan of one or more of the table's indexes. Oracle then locates each selected row in the table based on its rowid.
In the following example, an index range scan is performed. The rowids retrieved are used to return the row data.
SELECT last_name, first_name FROM per_people_f WHERE full_name = :b1; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY INDEX ROWID PER_PEOPLE_F INDEX RANGE SCAN PER_PEOPLE_F_N54
This is generally the second step after retrieving the ROWID
from an index. The table access might be required for any columns in the statement not present in the index.
Access by ROWID
does not need to follow every index scan. If the index contains all the columns needed for the statement, then table access by ROWID
might not happen.
Note:
|
An index scan retrieves data from an index based on the value of one or more columns of the index. To perform an index scan, Oracle searches the index for the indexed column values accessed by the statement. If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table.
The index contains not only the indexed value, but also the rowids of rows in the table having that value. Therefore, if the statement accesses other columns in addition to the indexed columns, then Oracle can find the rows in the table with a table access by rowid or a cluster scan.
An index scan can be one of the following types:
This returns, at most, a single rowid. Oracle performs a unique scan if there is a UNIQUE
or a PRIMARY
KEY
constraint that guarantees that the statement accesses only a single row.
The example below queries the orders table so_headers
for a given order (header_id
):
SELECT attribute2 FROM so_headers WHERE order_number = :b1 AND order_Type_id = :b2; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY ROWID OF SO_HEADERS_ALL INDEX UNIQUE SCAN OF SO_HEADERS_U2
This access path is used when all columns of a unique (B-tree) index are specified with equality conditions.
None of the statements below use a UNIQUE
scan, because the complete unique key is not being used in equality condition.
SELECT attribute2 FROM so_headers WHERE order_number = :b1 AND order_type_id > 0; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY ROWID OF SO_HEADERS_ALL INDEX RANGE SCAN OF SO_HEADERS_U2
See Also:
Oracle9i Database Concepts for more details on index structures and for detailed information on how a B-tree is searched |
In general, you should not need to give a hint to do a unique scan. There might be cases where the table is across a database link and being driven into from a local table, or where the table is small enough for the optimizer to prefer a full table scan.
The hint INDEX
(alias index_name) specifies the index to use, but not a specific access path (range scan or unique scan).
Index range scan is a common operation for accessing selective data. It can be bounded (bounded on both sides) or unbounded (on one or both sides). Data is returned in the ascending order of index columns. Multiple rows with identical values are sorted in ascending order by the ROWID
s.
If data is required to be sorted by order, then use the ORDER
BY
clause, and do not rely on an index. If an index can be used to satisfy an ORDER
BY
clause, then the optimizer uses this and avoids a sort.
In the example below, the order has been imported from a legacy system, and you are querying the order by the number in the legacy system. This should be a highly selective query, and you see the query using the index on the column to retrieve the desired rows. The data returned is sorted in ascending order by the original_system_reference
, ROWID
s. Because the index column original_system_reference
is identical for the selected rows here, the data is sorted by the ROWID
s.
SELECT attribute2, header_id FROM so_headers WHERE original_system_reference = :b1; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY ROWID OF SO_HEADERS_ALL INDEX RANGE SCAN OF SO_HEADERS_N5
The optimizer uses a range scan when the optimizer finds one or more leading columns of an index in conditions, such as the following:
col1 = :b1
col1 < :b1
col1 > :b1
col1
like `%ASD' does not result in a range scan.)
AND
combination of the above for leading columns in the index
Range scans can use unique or nonunique indexes. Range scans avoid sorting when index columns constitute the ORDER
BY
/GROUP
BY
clause.
A hint might be required if the optimizer chooses some other index or uses a full table scan. The hint INDEX
(table_alias index_name) specifies the index to use.
In the example below, the column s2
has a skewed distribution:
S2 Rows Blocks ------ ---------- ---------- 0 769 347 4 1460 881 5 5 4 8 816 590 18 1,028,946 343,043
The column has histograms, so the optimizer knows about the distribution. However, with a bind variable, the optimizer does not know the value and could choose a full table scan. Therefore, there are two options
Example before using the hint:
SELECT l.line_id, l.revenue_amount FROM so_lines_all l WHERE l.s2 = :b1; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS FULL SO_LINES_ALL
Example using literal values:
SELECT l.line_id, l.revenue_amount FROM so_lines_all l WHERE l.s2 = 4; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N7
Example using bind variables and hint:
SELECT /*+ INDEX(l so_lines_n7) */ l.line_id, l.revenue_amount FROM so_lines_all l WHERE l.s2 = :b1; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N7
Index range scan descending is identical to index range scan, except that the data is returned in a descending order. (Indexes, by default, are stored in ascending order.) Usually, this is used when ordering data in a descending order to return the most recent data first, or when seeking a value less than some value.
In the example below, the index used is the two column unique index on order_number
, order_type_id
. So, the data is sorted in descending order by the order_number
, order_type_id
, rowid
of the selected rows. However, because there is only one row per order_number
, order_type_id
(because so_headers_u2
is a unique index on the two columns), the rows are sorted by order_number
, order_type_id
.
SELECT attribute2, header_id FROM so_headers_all a WHERE order_number = :b1 ORDER BY order_number DESC; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY ROWID OF SO_HEADERS_ALL INDEX RANGE SCAN DESCENDING SO_HEADERS_U2
In the next example below, the index used is a single column index on purchase_order_num
. The data is retrieved in descending order by purchase_order_num
, rowid
.
SELECT attribute2, header_id FROM so_headers_all a WHERE purchase_order_num BETWEEN :b1 AND :b2 ORDER BY purchase_order_num DESC; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX RANGE SCAN DESCENDING SO_HEADERS_N3
The optimizer uses index range scan descending when an order by descending clause can be satisfied by an index.
The hint INDEX_DESC
(table_alias index_name) is used for this access path.
Example before using hint:
SELECT a.original_system_reference, a.original_system_source_code, a.header_id FROM so_headers_all a WHERE a.original_system_reference = :b1; Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX RANGE SCAN SO_HEADERS_N5
Example after using hint #1:
SELECT /*+INDEX_DESC(a so_headers_n5) */ a.original_system_reference, a.original_system_source_code, a.header_id FROM so_headers_all a WHERE a.original_system_reference = :b1;
The data with this hint is sorted in descending order by original_system_reference
, original_system_code
, rowid
.
Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX RANGE SCAN DESCENDING SO_HEADERS_N5
Example after using hint #2:
SELECT /*+INDEX_DESC(a so_headers_n9) */ a.original_system_reference, a.original_system_source_code, a.header_id FROM so_headers_all a WHERE a.original_system_reference = :b1;
The data with this hint is sorted in descending order by original_system_reference
, rowid
.
Plan ------------------------------------------------- SELECT STATEMENT TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX RANGE SCAN DESCENDING SO_HEADERS_N9
Index skip scans improve index scans by nonprefix columns. Often, it is faster to scan index blocks than it is to scan table data blocks.
Skip scanning lets a composite index be logically split into smaller subindexes. For example, table emp
(sex
, empno
, address
) with a composite index on (sex
, empno
). The number of logical subindexes is determined by the number of distinct values in the initial column.
Skip scanning is useful when the initial column of the composite index is not specified in the query. In other words, it is "skipped."
For example above, suppose you have the following index data:
(`F',98) (`F',100) (`F',102) (`F',104) (`M',101) (`M',103) (`M',105)
Skip scanning is advantageous if there are few distinct values of the leading column of the composite index and many values of the nonleading key of the index.
The index is split logically in the following two subindexes:
`F'
.
`M'
.
In the following query,
SELECT * FROM emp WHERE empno = 101;
the column sex
is skipped. A complete scan of the index is not performed, but the subindex with the value `F'
is searched first, then the subindex with the value `M'
is searched.
This is available if a predicate references one of the columns in the index. The predicate does not need to be an index driver. Full scan is also available when there is no predicate, if all of the columns in the table referenced in the query are included in the index and at least one of the index columns is not null. This can be used to eliminate a sort operation, because the data is ordered by the index key. It reads the blocks singly.
Fast full index scans are an alternative to a full table scan when the index contains all the columns that are needed for the query, and at least one column in the index key has the NOT
NULL
constraint. Fast full scan accesses the data in the index itself, without accessing the table. It cannot be used to eliminate a sort operation, because the data is not ordered by the index key. It reads the entire index using multiblock reads (unlike a full index scan) and can be parallelized.
Fast full scan is available only with the CBO. You can specify it with the initialization parameter OPTIMIZER_FEATURES_ENABLE
or the INDEX_FFS
hint. Fast full index scans cannot be performed against bitmap indexes.
A fast full scan is faster than a normal full index scan in that it can use multiblock I/O and can be parallelized just like a table scan. The following query and plan illustrate this feature.
SELECT COUNT(*) FROM t1, t2 WHERE t1.c1 > 50 AND t1.c2 = t2.c1; Plan
-------------------------------------------------
SELECT STATEMENT SORT AGGREGATE HASH JOIN TABLE ACCESS t1 FULL INDEX t2_c1_idx FAST FULL SCAN
Because index t2_c1_idx
contains all columns needed from table t2
, the optimizer uses a fast full index scan on that index.
Fast full index scans have the following restrictions:
NOT
NULL
constraint.
Fast full scan has a special index hint, INDEX_FFS
, which has the same format and arguments as the regular INDEX
hint.
This is a hash join of several indexes that together contain all the columns from the table that are referenced in the query. If an index join is used, then no table access is needed, because all the relevant column values can be retrieved from the indexes. An index join cannot be used to eliminate a sort operation. Index join is available only with the CBO.
The following statement uses an index join to access the empno
and sal
columns, both of which are indexed, in the emp
table:
SELECT empno, sal FROM emp WHERE sal > 2000; Plan
---------------------------------------------------------
OPERATION OPTIONS OBJECT_NAME --------------------------------------------------------- SELECT STATEMENT VIEW index$_join$_001 HASH JOIN INDEX RANGE SCAN EMP_SAL INDEX FAST FULL SCAN EMP_EMPNO
You can specify it with the initialization parameter OPTIMIZER_FEATURES_ENABLE
or the INDEX_JOIN
hint.
This uses a bitmap for key values and a mapping function that converts each bit position to a rowid. Bitmaps can efficiently merge indexes that correspond to several conditions in a WHERE
clause, using Boolean operations to resolve AND
and OR
conditions.
Bitmap access is available only with the CBO.
Attention: Bitmap indexes and bitmap join indexes are available only if you have purchased the Oracle9i Enterprise Edition. For more information on bitmap indexes, see Oracle9i Data Warehousing Guide. |
From a table stored in an indexed cluster, a cluster scan retrieves rows that have the same cluster key value. In an indexed cluster, all rows with the same cluster key value are stored in the same data blocks. To perform a cluster scan, Oracle first obtains the rowid of one of the selected rows by scanning the cluster index. Oracle then locates the rows based on this rowid.
Oracle can use a hash scan to locate rows in a hash cluster based on a hash value. In a hash cluster, all rows with the same hash value are stored in the same data blocks. To perform a hash scan, Oracle first obtains the hash value by applying a hash function to a cluster key value specified by the statement. Oracle then scans the data blocks containing rows with that hash value.
The CBO chooses an access path based on the following factors:
To choose an access path, the optimizer first determines which access paths are available by examining the conditions in the statement's WHERE
clause (and its FROM
clause for the SAMPLE
or SAMPLE
BLOCK
clause). The optimizer then generates a set of possible execution plans using available access paths and estimates the cost of each plan using the statistics for the index, columns, and tables accessible to the statement. Finally, optimizer chooses the execution plan with the lowest estimated cost.
The optimizer's choice among available access paths can be overridden with hints, except when the statement's FROM
clause contains SAMPLE
or SAMPLE
BLOCK
.
The following examples illustrate how the optimizer uses selectivity.
Example 1:
The following query uses an equality condition in its WHERE
clause to select all employees named Jackson:
SELECT * FROM emp WHERE ename = 'JACKSON';
If the ename
column is a unique or primary key, then the optimizer determines that there is only one employee named Jackson, and the query returns only one row. In this case, the query is very selective, and the optimizer is most likely to access the table using a unique scan on the index that enforces the unique or primary key.
Example 2:
Consider again the query in the previous example. If the ename
column is not a unique or primary key, then the optimizer can use the following statistics to estimate the query's selectivity:
USER_TAB_COLUMNS
.NUM_DISTINCT
is the number of values for each column in the table.
USER_TABLES
.NUM_ROWS
is the number of rows in each table.
By dividing the number of rows in the emp
table by the number of distinct values in the ename
column, the optimizer estimates what percentage of employees have the same name. By assuming that the ename
values are distributed uniformly, the optimizer uses this percentage as the estimated selectivity of the query.
Example 3:
The following query selects all employees with employee ID numbers less than 7500:
SELECT * FROM emp WHERE empno < 7500;
To estimate the selectivity of the query, the optimizer uses the boundary value of 7500 in the WHERE
clause condition and the values of the HIGH_VALUE
and LOW_VALUE
statistics for the empno
column, if available. These statistics can be found in the USER_TAB_COL_STATISTICS
view (or the USER_TAB_COLUMNS
view). The optimizer assumes that empno
values are distributed evenly in the range between the lowest value and highest value. The optimizer then determines what percentage of this range is less than the value 7500 and uses this value as the estimated selectivity of the query.
Example 4:
The following query uses a bind variable rather than a literal value for the boundary value in the WHERE
clause condition:
SELECT * FROM emp WHERE empno < :e1;
The optimizer does not know the value of the bind variable e1
. Indeed, the value of e1
might be different for each execution of the query. For this reason, the optimizer cannot use the means described in the previous example to determine selectivity of this query. In this case, the optimizer heuristically guesses a small value for the selectivity. This is an internal default value. The optimizer makes this assumption whenever a bind variable is used as a boundary value in a condition with one of the operators <, >, <=, or >=.
The optimizer's treatment of bind variables can cause it to choose different execution plans for SQL statements that differ only in the use of bind variables rather than constants. In one case in which this difference can be especially apparent, the optimizer might choose different execution plans for an embedded SQL statement with a bind variable in an Oracle precompiler program and the same SQL statement with a constant in SQL*Plus.
Example 5:
The following query uses two bind variables as boundary values in the condition with the BETWEEN
operator:
SELECT * FROM emp WHERE empno BETWEEN :low_e AND :high_e;
The optimizer decomposes the BETWEEN
condition into these two conditions:
empno >= :low_e empno <= :high_e
The optimizer heuristically estimates a small selectivity (an internal default value) for indexed columns in order to favor the use of the index.
Example 6:
The following query uses the BETWEEN
operator to select all employees with employee ID numbers between 7500 and 7800:
SELECT * FROM emp WHERE empno BETWEEN 7500 AND 7800;
To determine the selectivity of this query, the optimizer decomposes the WHERE
clause condition into these two conditions:
empno >= 7500 empno <= 7800
The optimizer estimates the individual selectivity of each condition using the means described in a previous example. The optimizer then uses these selectivities (S1 and S2) and the absolute value function (ABS) in this formula to estimate the selectivity (S) of the BETWEEN
condition:
S = ABS( S1 + S2 - 1 )
Joins are statements that retrieve data from more than one table. A join is characterized by multiple tables in the FROM
clause, and the relationship between the tables is defined through the existence of a join condition in the WHERE
clause.
This section discusses how the Oracle optimizer executes SQL statements that contain joins, anti-joins, and semi-joins. It also describes how the optimizer can use bitmap indexes to execute star queries, which join a fact table to multiple dimension tables.
To choose an execution plan for a join statement, the optimizer must make these interrelated decisions:
Access Paths |
As for simple statements, the optimizer must choose an access path to retrieve data from each table in the join statement. (see "Understanding Access Paths for the RBO" and "Understanding Access Paths for the CBO".) |
Join Method |
To join each pair of row sources, Oracle must perform one of these operations: |
Join Order |
To execute a statement that joins more than two tables, Oracle joins two of the tables, and then joins the resulting row source to the next table. This process is continued until all tables are joined into the result. |
The optimizer costs each join method and chooses the method with the least cost. If a join returns many rows, then the optimizer considers the following three factors:
The cost of a nested loops join = access cost of A + (access cost of B * number of rows from A).
The cost of a merge join = access cost of A + access cost of B + (sort cost of A + sort cost of B). An exception is when the data is presorted. In the presorted case, merge join costs = access cost of A + access cost of B where (sort cost of A + sort cost of B) = 0.
Estimated costs to perform a hash join = (access cost of A * number of hash partitions of B) + access cost of B.
Join methods include:
Nested loop (NL) joins are useful for joining small subsets of data and if the join condition is an efficient way of accessing the second table.
It is very important to ensure that the inner table is driven from the outer table. If the inner table's access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop. This can degrade performance considerably. In such cases, hash joins joining the two independent row sources perform better.
For nested loop joins:
NESTED LOOP <Outer Loop> <Inner Loop>
SELECT a.selling_price*a.ordered_quantity FROM so_lines a,so_headers b WHERE b.customer_id = :b1 AND a.header_id = b.header_id Plan ------------------------------------------------- SELECT STATEMENT NESTED LOOPS TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX RANGE SCAN SO_HEADERS_N1 TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N1
From the execution plan, the outer loop and the equivalent statement are
TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX RANGE SCAN SO_HEADERS_N1 (CUSTOMER_ID) SELECT <some columns> FROM so_headers b WHERE b.customer_id = :b1
The execution plan shows that the inner loop being iterated for every row fetched from the outer loop is
TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N1 (HEADER_ID) SELECT <some columns> FROM so_lines a WHERE a.header_id = :b2
Therefore, the statement that retrieves the lines for a customer can be broken down into two loops:
The optimizer uses NL joins when joining small number of rows with a good driving condition between the two tables. This is a join in which you drive from the outer loop to the inner loop, so the order of tables in the execution plan is important.
The outer loop is the driving row source. It produces a set of rows for driving the join condition. The row source can be a table accessed using an index scan or a full table scan. Also, the rows can be produced from any other operation. For example, the output from a nested loop join can be used as a row source for another nested loop join.
The inner loop is iterated for every row returned from the outer loop. This should ideally be an index scan. If the access path for the inner loop is not dependent on the outer loop, then you can end up with a Cartesian product. In such a case, for every iteration of the outer loop, the inner loop produces the same set of rows. Hence, it is better to use other join methods where two independent row sources are joined together.
If the optimizer is choosing to use some other join method, use the USE_NL
(a
b
) hint, where a
and b
are the aliases of the tables being joined.
In the example below, data is small enough for the optimizer to prefer full table scans and use hash joins.
SELECT l.selling_price*l.ordered_quantity FROM so_lines_all l, so_headers_all h WHERE h.customer_id = :b1 AND l.header_id = h.header_id; Plan ------------------------------------------------- SELECT STATEMENT HASH JOIN TABLE ACCESS FULL SO_HEADERS_ALL TABLE ACCESS FULL SO_LINES_ALL
The following gives a hint that changes the join method to nested loop. In this statement, so_headers_all
is accessed via a full table scan and the filter condition customer_id
= :b1
is applied to every row. For every row that meets the filter condition, so_lines_all
is accessed via the index so_lines_n1
(header_id
).
SELECT /*+ USE_NL(l h) */ l.selling_price*l.ordered_quantity FROM so_lines_all l, so_headers_all h WHERE h.customer_id = :b1 AND l.header_id = h.header_id; Plan ------------------------------------------------- SELECT STATEMENT NESTED LOOPS TABLE ACCESS FULL SO_HEADERS_ALL TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N1
Adding another hint here avoids the full table scan on so_headers_all
to get an execution plan similar to that on the larger systems (even though it might not be particularly efficient here).
SELECT /*+ USE_NL(l h) INDEX(h so_headers_n1) */ l.selling_price*l.ordered_ quantity FROM so_lines_all l, so_headers_all h WHERE h.customer_id = :b1 AND l.header_id = h.header_id; Plan ------------------------------------------------- SELECT STATEMENT NESTED LOOPS TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX RANGE SCAN SO_HEADERS_N1 TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N1
Because the outer loop can be a nested loop, you can nest them together to join more tables as needed, where each loop is a data access method.
SELECT STATEMENT NESTED LOOP 3 NESTED LOOP 2 (OUTER LOOP 3.1) NESTED LOOP 1 (OUTER LOOP 2.1) OUTER LOOP 1.1 - #1 INNER LOOP 1.2 - #2 INNER LOOP 2.2 - #3 INNER LOOP 3.2 - #4
This operation is used when an outer join is used between two tables. The outer join returns the outer (preserved) table rows, even when there are no corresponding rows in the inner (optional) table.
In a regular outer join, the optimizer chooses the order of tables (driving and driven) based on the cost. However, in an outer join, the order of tables is determined by the join condition. The outer table (whose rows are being preserved) is used to drive to the inner table.
In the example below, all the customers created in the last 30 days are queried to see how active they are. An outer join is needed so that you don't miss the customers who do not have any orders.
SELECT customer_name, 86(nvl2(h.customer_id,0,1)) "Count" FROM ra_customers c, so_headers_all h WHERE c.creation_date > SYSDATE - 30 AND c.customer_id = h.customer_id(+) GROUP BY customer_name; Plan -------------------------------------------------- SELECT STATEMENT SORT GROUP BY NESTED LOOPS OUTER TABLE ACCESS BY INDEX ROWID RA_CUSTOMERS INDEX RANGE SCAN RA_CUSTOMERS_N2 INDEX RANGE SCAN SO_HEADERS_N1
In this case, the outer join condition is the following
ra_customers.customer_id = so_headers_all.customer_id(+)
This means the following:
ra_customers
.
so_headers_all
.
ra_customers
rows, including those rows without a corresponding row in so_headers_all
.
ra_customers
to so_headers_all
.
The optimizer uses nested loop joins to process an outer join if the following are true:
To have the optimizer to use a nested loop for the join, use the hint USE_NL
.
Hash joins are used for joining large data sets. The optimizer uses the smaller of the two tables/data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.
This is best when the smaller table is small enough to fit in available memory. The cost is then limited to a single read pass over the data for the two tables.
However, if the hash table grows too big to fit into the memory, then the optimizer breaks it up into different partitions. As the partitions exceed allocated memory, parts are written to disk to temporary segments.
After the hash table is complete:
When the hash table build completes, it is possible that an entire hash table partition is resident in memory. Then, you do not need to build the corresponding partition for the second table. When table 2 is scanned, rows that hash to the resident hash table partition can be joined and returned immediately.
Now, each hash table partition is read into memory:
This is repeated for the rest of the partitions. The cost can increase to two read passes over the data and one write pass over the data.
There is also the possibility that if the hash table does not fit in the memory, then parts of it must be swapped in and out, depending on the rows retrieved from the second table. Performance for this scenario can be extremely poor.
The optimizer uses a hash join to join two tables if they are joined using an equijoin and if either of the following are true:
SELECT h.customer_id, l.selling_price*l.ordered_quantity FROM so_headers_all h ,so_lines_all l WHERE l.header_id = h.header_id Plan ------------------------------------------------- SELECT STATEMENT HASH JOIN TABLE ACCESS FULL SO_HEADERS_ALL TABLE ACCESS FULL SO_LINES_ALL
In the above example, the table so_headers_all
is used to build the hash table, and so_lines_all
is the larger table, which is scanned later.
Use the USE_HASH
hint to advise the optimizer to use a hash join when joining two tables together. Investigate the values for the parameters HASH_AREA_SIZE
and HASH_JOIN_ENABLED
if you are having trouble getting the optimizer to use hash joins.
SELECT l.promise_date,l.inventory_item_id, SUM(l2.ordered_quantity) FROM so_lines_all l, so_lines_all l2 WHERE l.inventory_item_id = l2.inventory_item_id AND l.warehouse_id = l2.warehouse_id AND l2.promise_date < l.promise_date GROUP BY l.inventory_item_id, l.promise_date, l.header_id Plan -------------------------------------------------- SELECT STATEMENT SORT GROUP BY MERGE JOIN SORT JOIN TABLE ACCESS FULL SO_LINES_ALL FILTER SORT JOIN TABLE ACCESS FULL SO_LINES_ALL
SELECT /*+USE_HASH(l l2) */ l.promise_date,l.inventory_item_id, SUM(l2.ordered_ quantity) FROM so_lines_all l, so_lines_all l2 WHERE l.inventory_item_id = l2.inventory_item_id AND l.warehouse_id = l2.warehouse_id AND l2.promise_date < l.promise_date GROUP BY l.inventory_item_id, l.promise_date, l.header_id Plan -------------------------------------------------- SELECT STATEMENT SORT GROUP BY HASH JOIN TABLE ACCESS FULL SO_LINES_ALL TABLE ACCESS FULL SO_LINES_ALL
This operation is used for outer joins where the optimizer decides that the amount of data is large enough to warrant an hash join, or it is unable to drive from the outer table to the inner table.
Like an outer join, the order of tables is not determined by the cost, but by the join condition. The outer table (whose rows are being preserved) is used to build the hash table, and the inner table is used to probe the hash table.
The example below looks for inactive customers (more than a year old and have no orders). An outer join returns NULL
for the inner table columns along with the outer (preserved) table rows when it does not find any corresponding rows in the inner table. This finds all the ra_customers
rows that do not have any so_headers_all
rows.
SELECT customer_name FROM ra_customers c, so_headers_all h WHERE c.creation_date < SYSDATE - 365 AND h.customer_id(+) = c.customer_id AND h.customer_id IS NULL; Plan -------------------------------------------------- SELECT STATEMENT FILTER HASH JOIN OUTER TABLE ACCESS BY INDEX ROWID RA_CUSTOMERS INDEX RANGE SCAN RA_CUSTOMERS_N2 INDEX FAST FULL SCAN SO_HEADERS_N1
In this case, the outer join condition is the following:
ra_customers.customer_id = so_headers_all.customer_id(+)
This means the following:
ra_customers
.
so_headers_all
.
ra_customers
rows, including those rows without a corresponding row in so_headers_all
.
ra_customers
.
so_headers_all
.
You could have used a NOT
EXISTS
subquery to return the rows, but because you are querying all the rows in the table, the hash join performs better (unless the NOT
EXISTS
subquery is un-nested).
The optimizer uses hash joins for processing an outer join if the data volume is high enough to make hash join method efficient or if it is not possible to drive from the outer table to inner table.
In the following example, the outer join is to a multitable view. The optimizer cannot drive into the view like in a normal join or push the predicates, so it builds the entire row set of the view.
SELECT c.customer_name, sum(revenue) FROM ra_customers c, orders h WHERE c.creation_date > sysdate - 30 AND h.customer_id(+) = c.customer_id GROUP BY c.customer_name; Plan -------------------------------------------------- SELECT STATEMENT SORT GROUP BY HASH JOIN OUTER TABLE ACCESS BY INDEX ROWID RA_CUSTOMERS INDEX RANGE SCAN RA_CUSTOMERS_N2 VIEW ORDERS SORT GROUP BY HASH JOIN TABLE ACCESS FULL SO_HEADERS_ALL TABLE ACCESS FULL SO_LINES_ALL
View Definition
CREATE OR REPLACE view orders AS SELECT h.order_number, SUM(l.revenue_amount) revenue, h.header_id, h.customer_id FROM so_headers_All h, so_lines_all l WHERE h.header_id = l.header_id GROUP BY h.order_number, h.header_id, h.customer_id;
To use hash join for doing the join, use the hint USE_HASH
.
Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. However, sort merge joins can perform better than hash joins if the row sources are sorted already, and if a sort operation does not have to be done. However, if this involves choosing a slower access method (index scan vs. full table scan), then the benefit of using a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. (You cannot use hash joins unless there is an equality condition).
In a merge join, there is no concept of a driving table.
If the input is sorted already by the join column, then there is not a sort join operation for that row source.
SELECT SUM(l.revenue_amount), l2.creation_date FROM so_lines_all l, so_lines_all l2 WHERE l.creation_date < l2.creation_date AND l.header_id <> l2.header_id GROUP BY l2.creation_date, l2.line_id Plan -------------------------------------------------- SELECT STATEMENT SORT GROUP BY MERGE JOIN SORT JOIN TABLE ACCESS FULL SO_LINES_ALL FILTER SORT JOIN TABLE ACCESS FULL SO_LINES_ALL
In the above example tracking revenue generation, there is no equi-join operation. Therefore, you can use either nested loops or sort merge to join the data. With the data volumes, sort merge (as chosen by the optimizer) is the better option.
The optimizer can choose sort merge join over hash join for joining large amounts of data if any of the following are true:
OPTIMIZER_MODE
is set to RULE
HASH_JOIN_ENABLED
is false
HASH_AREA_SIZE
and SORT_AREA_SIZE
In this following example tracking inventory consumption, the optimizer avoids a sort for the GROUP
BY
if it chooses the sort merge operation.
SELECT msi.inventory_item_id, SUM(l.ordered_quantity) FROM mtl_system_items msi, so_lines_all l WHERE msi.inventory_item_id = l.inventory_item_id AND msi.inventory_item_id BETWEEN :b1 AND :b2 GROUP BY msi.inventory_item_id Plan -------------------------------------------------- SELECT STATEMENT SORT GROUP BY NOSORT MERGE JOIN TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N5 SORT JOIN INDEX RANGE SCAN MTL_SYSTEM_ITEMS_U1
Use the USE_MERGE
hint to advise the optimizer to use a merge join when joining the two tables together. In addition, it might be necessary to give hints to force an access path. For example:
SELECT h.customer_id, l.selling_price*l.ordered_quantity FROM so_headers_all h ,so_lines_all l WHERE l.header_id = h.header_id Plan ------------------------------------------------- SELECT STATEMENT NESTED LOOPS TABLE ACCESS FULL SO_LINES_ALL TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX UNIQUE SCAN SO_HEADERS_U1
The optimizer chooses a full scan on so_lines_all
, thus avoiding a sort. However, there is an increased cost because now a large table is accessed via an index and single block reads opposed to faster access via full table scan.
SELECT /*+USE_MERGE(h l) */ h.customer_id, l.selling_price*l.ordered_quantity FROM so_headers_all h ,so_lines_all l WHERE l.header_id = h.header_id Plan ------------------------------------------------- SELECT STATEMENT MERGE JOIN TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX FULL SCAN SO_LINES_N1 SORT JOIN TABLE ACCESS FULL SO_HEADERS_ALL
Here, there is a full scan on both tables and the resulting inputs are sorted before merging them:
SELECT /*+USE_MERGE(h l) FULL(l) */ h.customer_id, l.selling_price*l.ordered_ quantity FROM so_headers_all h ,so_lines_all l WHERE l.header_id = h.header_id Plan ------------------------------------------------- SELECT STATEMENT MERGE JOIN SORT JOIN TABLE ACCESS FULL SO_HEADERS_ALL SORT JOIN TABLE ACCESS FULL SO_LINES_ALL
When an outer join cannot drive from the outer (preserved) table to the inner (optional) table, it cannot use hash join or nested loop joins. Then it uses the sort merge join for executing the join operation.
The optimizer uses sort merge for an outer join if a nested loop join is inefficient. A nested loop join can be inefficient because of data volumes, or because of sorts already required by other operations, the optimizer finds it is cheaper to use a sort merge over a hash join.
In the following example of an inventory usage report, the optimizer avoids a sort for the GROUP
BY
operation by using the sort merge operation.
SELECT msi.inventory_item_id, SUM(l.ordered_quantity) FROM mtl_system_items msi, so_lines_all l WHERE msi.inventory_item_id = l.inventory_item_id(+) GROUP BY msi.inventory_item_id; Plan -------------------------------------------------- SELECT STATEMENT SORT GROUP BY NOSORT MERGE JOIN OUTER SORT JOIN INDEX FAST FULL SCAN MTL_SYSTEM_ITEMS_U1 SORT JOIN TABLE ACCESS FULL SO_LINES_ALL
To use sort merge for doing the join, use the hint USE_MERGE
.
A Cartesian join happens when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source (Cartesian product of the two sets).
In some cases, there might be a common filter condition between the two tables that could be picked up by the optimizer as a possible join condition. This is even more dangerous, because the joins are not flagged in the execution plan as being a Cartesian product.
Cartesian joins generally result from poorly-written SQL. The example below has three tables in the FROM
clause, but only one join condition joining the two tables.
Although this is a simplified version with three tables, this can happen when someone is writing a query involving large number of tables and an extra table gets into the FROM
clause but not into the WHERE
clause. With such queries, a DISTINCT
clause can weed out multiple rows.
SELECT DISTINCT h.header_id, l.line_id, l.ordered_quantity FROM so_lines_all l, so_headers_all h, so_lines_all l2 WHERE h.customer_id = :b1 AND l.header_id = h.header_id; Plan ------------------------------------------------- SELECT STATEMENT SORT UNIQUE MERGE JOIN CARTESIAN NESTED LOOPS TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX RANGE SCAN SO_HEADERS_N1 TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N1 SORT JOIN INDEX FAST FULL SCAN SO_LINES_N1
The optimizer gets the ROWID
s for so_lines_all
from the index so_lines_n1
. This is because header_id
is a NOT
NULL
column, so the index has all the ROWID
s. It sorts the results by the join key.
The optimizer joins L and H using a nested loop join, and for every row returned, it returns all the rows from the entire so_lines_all
table. The execution plan shows that a merge join is used for the Cartesian product.
The following example illustrates a nested loop with an implicit Cartesian product. If the inner table of a nested loop operation is not driven from the outer table, but from an independent row source, then the rows accessed can be the same as in a Cartesian product. Because the join condition is present but is applied after accessing the table, it is not a Cartesian product. However, the cost of accessing the table (rows accessed) is about the same.
SELECT a.attribute4,b.full_name FROM per_all_people_f b, so_lines_all a WHERE a.header_id = :b1 AND b.employee_number = a.created_by AND b.business_group_id = :b2; Plan ------------------------------------------------- SELECT STATEMENT NESTED LOOPS TABLE ACCESS BY INDEX ROWID PER_ALL_PEOPLE_F INDEX RANGE SCAN PER_PEOPLE_F_FK1 TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N1
In examining the statement, do the following:
so_lines_all
using header_id
(so_line_n1
).
per_all_people_f
using employee_number
(per_peopl_f_n51
) for every row returned from so_lines_all
.
business_group_id
filter to per_people_f
.
However, the execution plan shows something quite different. The datatype for employee_number
is VARCHAR2
, while created_by
is NUMBER
, so there is an implicit type conversion on employee_number
disabling the index. The result is two independent sources:
For every row in #1, every row in #2 is accessed, but returns only those rows that meet the following condition:
TO_NUMBER(b.employee_number) = a.created_by
Because there is a join condition, it does not show up as a Cartesian product. However, the number of data accesses are the same.
The optimizer uses Cartesian joins when it is asked to join two tables with no join conditions.
The hint ORDERED
can cause a Cartesian join. By specifying a table before its join table is specified, the optimizer does a Cartesian join.
Example before using hint:
SELECT h.purchase_order_num, sum(l.revenue_amount) FROM ra_customers c, so_lines_all l, so_headers_all h WHERE c.customer_name = :b1 AND h.customer_id = c.customer_id AND h.header_id = l.header_id GROUP BY h.purchase_order_num; Plan ------------------------------------------------- SELECT STATEMENT SORT GROUP BY NESTED LOOPS NESTED LOOPS TABLE ACCESS BY INDEX ROWID RA_CUSTOMERS INDEX RANGE SCAN RA_CUSTOMERS_N1 TABLE ACCESS BY INDEX ROWID SO_HEADERS_ALL INDEX RANGE SCAN SO_HEADERS_N1 TABLE ACCESS BY INDEX ROWID SO_LINES_ALL INDEX RANGE SCAN SO_LINES_N1
The following is necessary:
customer_name
.
headers
using customer_id
.
lines
using line_id
.
This is exactly what the optimizer is doing.
Example after using hint:
SELECT /*+ORDERED */ h.purchase_order_num, sum(l.revenue_amount) FROM ra_customers c, so_lines_all l, so_headers_all h WHERE c.customer_name = :b1 AND h.customer_id = c.customer_id AND h.header_id = l.header_id GROUP BY h.purchase_order_num; Plan ------------------------------------------------- SELECT STATEMENT SORT GROUP BY MERGE JOIN SORT JOIN MERGE JOIN CARTESIAN TABLE ACCESS BY INDEX ROWID RA_CUSTOMERS INDEX RANGE SCAN RA_CUSTOMERS_N1 SORT JOIN TABLE ACCESS FULL SO_LINES_ALL SORT JOIN TABLE ACCESS FULL SO_HEADERS_ALL
The optimizer now does the following:
customers
using customer_name
lines
, because it is the next column in the FROM
clause
customers
get multiplied several millionfold rows in lines
customer_id
, header_id
)
headers
sorted by (customer_id
, header_id
)
With a nonselective filter condition between lines
and customers
SELECT /*+ORDERED */ h.purchase_order_num, SUM(l.revenue_amount) FROM ra_customers c, so_lines_all l, so_headers_all h WHERE c.customer_name = :b1 AND h.customer_id = c.customer_id AND h.header_id = l.header_id AND c.price_list_id = l.price_list_id GROUP BY h.purchase_order_num; Plan ------------------------------------------------- SELECT STATEMENT SORT GROUP BY MERGE JOIN SORT JOIN NESTED LOOPS TABLE ACCESS BY INDEX ROWID RA_CUSTOMERS INDEX RANGE SCAN RA_CUSTOMERS_N1 TABLE ACCESS FULL SO_LINES_ALL SORT JOIN TABLE ACCESS FULL SO_HEADERS_ALL
The execution plan is similar, but now, because the optimizer finds a join condition between the two tables, it removes the Cartesian. Even though an actual Cartesian product is not built, the plan is equally bad in terms of block accesses, if not worse. This is because nested loop joins do not perform well with large data sets.
A full outer join acts like a combination of the left and right outer joins. In addition to the inner join, rows from both tables that have not been returned in the result of the inner join are preserved and extended with nulls.
In other words, full outer joins let you join tables together, yet still show rows which do not have corresponding rows in tables joined-to.
The following example shows all departments and all employees in each department, but also includes:
SELECT d.department_id, e.employee_id FROM employees e FULL OUTER JOIN departments d ON e.department_id = d.department_id ORDER BY d.department_id;
This is the output:
DEPARTMENT_ID EMPLOYEE_ID ------------- ----------- 10 255200 20 255202 20 255201 30 40 255203 255199 255204 255206 255205
With the CBO, the optimizer generates a set of execution plans based on the possible join orders, join methods, and available access paths. The optimizer then estimates the cost of each plan and chooses the one with the lowest cost. The optimizer estimates costs in these ways:
SORT_AREA_SIZE
.
DB_FILE_MULTIBLOCK_READ_COUNT
.
With the CBO, the optimizer's choice of join orders can be overridden with the ORDERED
hint. If the ORDERED
hint specifies a join order that violates the rule for outer join, then the optimizer ignores the hint and chooses the order. Also, you can override the optimizer's choice of join method with hints.
An anti-join returns rows from the left side of the predicate for which there is no corresponding row on the right side of the predicate. That is, it returns rows that fail to match (NOT
IN
) the subquery on the right side. For example, an anti-join can select a list of employees who are not in a particular set of departments:
SELECT * FROM emp WHERE deptno NOT IN (SELECT deptno FROM dept WHERE loc = 'HEADQUARTERS');
The optimizer uses a nested-loops algorithm for NOT
IN
subqueries by default, unless the MERGE_AJ
, HASH_AJ
, or NL_AJ
hint is used and various required conditions are met, which allow the transformation of the NOT
IN
uncorrelated subquery into a sort-merge or hash anti-join.
A semi-join returns rows that match an EXISTS
subquery without duplicating rows from the left side of the predicate when multiple rows on the right side satisfy the criteria of the subquery. For example:
SELECT * FROM dept WHERE EXISTS (SELECT * FROM emp WHERE dept.ename = emp.ename AND emp.bonus > 5000);
In this query, only one row needs to be returned from dept
even though many rows in emp
might match the subquery. If there is no index on the bonus
column in emp
, then a semi-join can be used to improve query performance.
The optimizer uses a nested loops algorithm by default for IN
or EXISTS
subqueries that cannot be merged with the containing query, unless the MERGE_SJ
, HASH_SJ
, or NL_SJ
hint is used and various required conditions are met, which allow the transformation of the subquery into a sort-merge or hash semi-join.
Some data warehouses are designed around a star schema, which includes a large fact table and several small dimension (lookup) tables. The fact table stores primary information. Each dimension table stores information about an attribute in the fact table.
A star query is a join between a fact table and a number of lookup tables. Each lookup table is joined to the fact table using a primary-key to foreign-key join, but the lookup tables are not joined to each other.
The CBO recognizes star queries and generates efficient execution plans for them. (Star queries are not recognized by the RBO.)
A typical fact table contains keys and measures. For example, a simple fact table might contain the measure Sales, and keys Time, Product, and Market. In this case there would be corresponding dimension tables for Time, Product, and Market. The Product dimension table, for example, typically contains information about each product number that appears in the fact table.
A star join is a primary-key to foreign-key join of the dimension tables to a fact table. The fact table normally has a concatenated index on the key columns to facilitate this type of join or a separate bitmap index on each key column.
This section contains some, but not all, of the parameters specific to the optimizer. The following sections are useful especially when tuning Oracle Applications.
The OPTIMIZER_FEATURES_ENABLE
parameter acts as an umbrella parameter for the CBO. This parameter can be used to enable a series of CBO-related features depending on the release. It accepts one of a list of valid string values corresponding to the release numbers, such as 8.0.4, 8.1.5, and so on.
The following statement enables the use of the optimizer features in Oracle8i release 8.1.6.
OPTIMIZER_FEATURES_ENABLE=8.1.6;
The above statement causes the optimizer features available in release 8.1.6 to be used in generating query plans. For example, one of the changes in release 8.1.6 is the use of ALL_ROWS
or FIRST_ROWS
optimizer mode for recursive user SQL generated by PL/SQL procedures. Prior to Oracle8i release 8.1.6, only RULE
or CHOOSE
optimizer mode was used for such recursive SQL, and when the user explicitly set the OPTIMIZER_MODE
parameter to FIRST_ROWS
or ALL_ROWS
, a CHOOSE
mode was used instead.
The OPTIMIZER_FEATURES_ENABLE
parameter was introduced in Oracle8 release 8.0.4. The main goal was to allow customers the ability to upgrade the Oracle server, yet preserve the old behavior of the CBO after the upgrade. For example, when you upgrade the Oracle server from release 8.1.5 to release 8.1.6, the default value of the OPTIMIZER_FEATURES_ENABLE
parameter changes from 8.1.5 to 8.1.6. This results in the cost--based optimizer enabling optimization features based on 8.1.6, as opposed to 8.1.5. For plan stability or backward compatibility reasons, you might not want the query plans to change based on the new optimizer features of release 8.1.6. In such a case, set the OPTIMIZER_FEATURES_ENABLE
parameter to an earlier version, such as 8.1.5. To disable all new optimizer features in release 8.1.6, set the parameter as follows:
OPTIMIZER_FEATURES_ENABLE=8.1.5;
To preserve an even older behavior of the CBO, such as release 8.0.4, set the parameter as follows:
OPTIMIZER_FEATURES_ENABLE=8.0.4;
The above statement disables all new optimizer features that were added in releases following 8.0.4.
Oracle Corporation does not recommend explicitly setting the OPTIMIZER_FEATURES_ENABLE
parameter to an earlier release. Instead, execution plan or query performance issues should be resolved on a case by case basis.
The table below describes some of the optimizer features that are enabled when you set the OPTIMIZER_FEATURES_ENABLE
parameter to each of the following release values.
The CBO peeks at the values of user-defined bind variables on the first invocation of a cursor. This lets the optimizer determine the selectivity of any WHERE
clause condition, as well as if literals had been used instead of bind variables. On subsequent invocations of the cursor, no peeking takes place, and the cursor is shared based on the standard cursor-sharing criteria, even if subsequent invocations use different bind values.
When bind variables are used in a statement, it is assumed that cursor sharing is intended and that different invocations are supposed to use the same execution plan. If different invocations of the cursor would significantly benefit from different execution plans, then bind variables may have been used inappropriately in the SQL statement.
The following table lists initialization parameters that can be used to control the behavior of the CBO. These parameters can be used to enable various optimizer features in order to improve the performance of SQL execution.
The extensible optimizer is part of the CBO. It allows the authors of user-defined functions and domain indexes to control the three main components that the CBO uses to select an execution plan: statistics, selectivity, and cost evaluation.
The extensible optimizer lets you:
ANALYZE
statement to invoke user-defined statistics collection and deletion functions.
You can define statistics collection functions for domain indexes, individual columns of a table, and user-defined datatypes.
Whenever a domain index is analyzed to gather statistics, Oracle calls the associated statistics collection function. Whenever a column of a table is analyzed, Oracle collects the standard statistics for that column and calls any associated statistics collection function. If a statistics collection function exists for a datatype, then Oracle calls it for each column that has that datatype in the table being analyzed.
The selectivity of a predicate in a SQL statement is used to estimate the cost of a particular access path; it is also used to determine the optimal join order. The optimizer cannot compute an accurate selectivity for predicates that contain user-defined operators, because it does not have any information about these operators.
You can define selectivity functions for predicates containing user-defined operators, standalone functions, package functions, or type methods. The optimizer calls the user-defined selectivity function whenever it encounters a predicate that contains the operator, function, or method in one of the following relations with a constant: <, <=, =, >=, >, or LIKE
.
The optimizer cannot compute an accurate estimate of the cost of a domain index because it does not know the internal storage structure of the index. Also, the optimizer might underestimate the cost of a user-defined function that invokes PL/SQL, uses recursive SQL, accesses a BFILE
, or is CPU-intensive.
You can define costs for domain indexes and user-defined standalone functions, package functions, and type methods. These user-defined costs can be in the form of default costs that the optimizer simply looks up or they can be full-fledged cost functions that the optimizer calls to compute the cost.
|
![]() Copyright © 1996-2001, Oracle Corporation. All Rights Reserved. | | Ad Choices. |
|