经常看到有人提出这样的疑问,我在同一张表上建立了多个索引,为什么Oracle每次都选择一个,而不能同时利用多个索引呢。一般来说,常见的访问同一张表的两个以上索引,存在三种情况,AND-EQUAL、INDEX HASH JOIN和BITMAP INDEX AND/OR。
此外,还有一个设计上的疑问,如果有A、B、C三个字段,都可能作为查询条件,是建立多个复合索引好,还是建立三个单列的索引。这个问题之所以不好回答是因为和业务或者说和查询的模式有很大的关系,不过如果理解了Oracle什么时候会选择一个以上的索引来访问表,就会对于理解如何设计合理的索引有很大的帮助。
简单介绍一下INDEX HASH JOIN执行计划。
首先建立一个测试表:
SQL> DROP TABLE T_DOUBLE_IND PURGE;
Table dropped.
SQL> CREATE TABLE T_DOUBLE_IND
2 (ID NUMBER,
3 NAME VARCHAR2(30),
4 TYPE VARCHAR2(30),
5 CONTENTS VARCHAR2(4000));
Table created.
SQL> INSERT INTO T_DOUBLE_IND
2 SELECT ROWNUM,
3 OBJECT_NAME,
4 OBJECT_TYPE,
5 LPAD('A', 1000, 'A')
6 FROM DBA_OBJECTS;
76058 rows created.
SQL> CREATE INDEX IND_DOUBLE_NAME
2 ON T_DOUBLE_IND (NAME);
Index created.
SQL> CREATE INDEX IND_DOUBLE_TYPE
2 ON T_DOUBLE_IND (TYPE);
Index created.
现在建立了一个测试表和两个索引,下面看看如何利用索引哈希连接:
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'T_DOUBLE_IND')
PL/SQL procedure successfully completed.
SQL> SET AUTOT ON EXP
SQL> SELECT NAME, TYPE
2 FROM T_DOUBLE_IND
3 WHERE NAME = 'T_DOUBLE_IND'
4 AND TYPE = 'TABLE';
NAME TYPE
------------------------------ ------------------------------
T_DOUBLE_IND TABLE
Execution Plan
----------------------------------------------------------
Plan hash value: 3887138334
--------------------------------------------------------------------------------------------
| Id| Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 3 (0)| 00:00:01 |
|*1 | TABLE ACCESS BY INDEX ROWID| T_DOUBLE_IND | 1 | 32 | 3 (0)| 00:00:01 |
|*2 | INDEX RANGE SCAN | IND_DOUBLE_NAME | 2 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TYPE"='TABLE')
2 - access("NAME"='T_DOUBLE_IND')
SQL> SELECT /*+ INDEX_JOIN(A IND_DOUBLE_NAME IND_DOUBLE_TYPE) */ NAME, TYPE
2 FROM T_DOUBLE_IND A
3 WHERE NAME = 'T_DOUBLE_IND'
4 AND TYPE = 'TABLE';
NAME TYPE
------------------------------ ------------------------------
T_DOUBLE_IND TABLE
Execution Plan
----------------------------------------------------------
Plan hash value: 1096369674
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 6 (17)| 00:00:01 |
|* 1 | VIEW | index$_join$_001 | 1 | 32 | 6 (17)| 00:00:01 |
|* 2 | HASH JOIN | | | | | |
|* 3 | INDEX RANGE SCAN| IND_DOUBLE_NAME | 1 | 32 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| IND_DOUBLE_TYPE | 1 | 32 | 4 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME"='T_DOUBLE_IND' AND "TYPE"='TABLE')
2 - access(ROWID=ROWID)
3 - access("NAME"='T_DOUBLE_IND')
4 - access("TYPE"='TABLE')
利用INDEX_JOIN提示,可以使得Oracle选择INDEX HASH JOIN连接方式,从而避免对表进行访问。Oracle选择两个索引进行范围扫描,然后执行HASH JOIN,而HASH JOIN的连接列是ROWID,最后通过一个内部视图执行查询条件的过滤。
不过这种执行计划的前提是,查询所选择的列必须能够通过索引完全的提供,如果包含了索引中不存在的列,Oracle是不会再次选择扫描表的。
SQL> SELECT /*+ INDEX_JOIN(A IND_DOUBLE_NAME IND_DOUBLE_TYPE) */ ID, NAME, TYPE
2 FROM T_DOUBLE_IND A
3 WHERE NAME = 'T_DOUBLE_IND'
4 AND TYPE = 'TABLE';
ID NAME TYPE
---------- ------------------------------ ------------------------------
70924 T_DOUBLE_IND TABLE
Execution Plan
----------------------------------------------------------
Plan hash value: 3887138334
--------------------------------------------------------------------------------------------
| Id| Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 37 | 3 (0)| 00:00:01 |
|*1 | TABLE ACCESS BY INDEX ROWID| T_DOUBLE_IND | 1 | 37 | 3 (0)| 00:00:01 |
|*2 | INDEX RANGE SCAN | IND_DOUBLE_NAME | 2 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TYPE"='TABLE')
2 - access("NAME"='T_DOUBLE_IND')
不仅如此,就是选择了索引中包含的ROWID信息,Oracle仍然会不会选择INDEX HASH JOIN:
SQL> SELECT /*+ INDEX_JOIN(A IND_DOUBLE_NAME IND_DOUBLE_TYPE) */ ROWID, NAME, TYPE
2 FROM T_DOUBLE_IND A
3 WHERE NAME = 'T_DOUBLE_IND'
4 AND TYPE = 'TABLE';
ROWID NAME TYPE
------------------ ------------------------------ ------------------------------
AAAjc5AAVAAAqQUAAD T_DOUBLE_IND TABLE
Execution Plan
----------------------------------------------------------
Plan hash value: 3887138334
--------------------------------------------------------------------------------------------
| Id| Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 44 | 3 (0)| 00:00:01 |
|*1 | TABLE ACCESS BY INDEX ROWID| T_DOUBLE_IND | 1 | 44 | 3 (0)| 00:00:01 |
|*2 | INDEX RANGE SCAN | IND_DOUBLE_NAME | 2 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TYPE"='TABLE')
2 - access("NAME"='T_DOUBLE_IND')
其实这是没有道理的,因为索引中本身就包括ROWID信息,而且两个索引做HASH JOIN的时候,连接列就是ROWID,因此这个查询本来应该可以通过INDEX HASH JOIN来实现的,可能CBO优化器在处理这个算法的时候忽略了这个问题。
当然,查询COUNT(*)的操作也是可以利用INDEX HASH JOIN的:
SQL> SELECT /*+ INDEX_JOIN(A IND_DOUBLE_NAME IND_DOUBLE_TYPE) */ COUNT(*)
2 FROM T_DOUBLE_IND A
3 WHERE NAME = 'T_DOUBLE_IND'
4 AND TYPE = 'TABLE';
COUNT(*)
----------
1
Execution Plan
----------------------------------------------------------
Plan hash value: 2001662014
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 32 | 6 (17)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 32 | | |
|* 2 | VIEW | index$_join$_001 | 1 | 32 | 6 (17)| 00:00:01 |
|* 3 | HASH JOIN | | | | | |
|* 4 | INDEX RANGE SCAN| IND_DOUBLE_NAME | 1 | 32 | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN| IND_DOUBLE_TYPE | 1 | 32 | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("NAME"='T_DOUBLE_IND' AND "TYPE"='TABLE')
3 - access(ROWID=ROWID)
4 - access("NAME"='T_DOUBLE_IND')
5 - access("TYPE"='TABLE')
Oracle会自动根据统计信息权衡是否选择INDEX HASH JOIN,而不需要通过HINT来强制执行:
SQL> SET AUTOT OFF
SQL> UPDATE T_DOUBLE_IND
2 SET NAME = 'T_DOUBLE_IND'
3 WHERE ID <= 2000;
2000 rows updated.
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'T_DOUBLE_IND')
PL/SQL procedure successfully completed.
SQL> SET AUTOT TRACE EXP
SQL> SELECT NAME, TYPE
2 FROM T_DOUBLE_IND A
3 WHERE NAME = 'T_DOUBLE_IND'
4 AND TYPE = 'TABLE';
Execution Plan
----------------------------------------------------------
Plan hash value: 1096369674
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 259 | 8288 | 19 (6)| 00:00:01 |
|* 1 | VIEW | index$_join$_001 | 259 | 8288 | 19 (6)| 00:00:01 |
|* 2 | HASH JOIN | | | | | |
|* 3 | INDEX RANGE SCAN| IND_DOUBLE_NAME | 259 | 8288 | 5 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN| IND_DOUBLE_TYPE | 259 | 8288 | 13 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("NAME"='T_DOUBLE_IND' AND "TYPE"='TABLE')
2 - access(ROWID=ROWID)
3 - access("NAME"='T_DOUBLE_IND')
4 - access("TYPE"='TABLE')
但是由于INDEX HASH JOIN的限制,使得这个查询只能满足查询索引列或COUNT(*)的情况,因此使得这个执行计划的使用机会大大降低。
而如果建立复合索引,讲会使得查询代价大为降低,而且可以满足继续扫描表的查询要求:
SQL> CREATE INDEX IND_DOU_NAM_TYP
2 ON T_DOUBLE_IND (NAME, TYPE);
Index created.
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'T_DOUBLE_IND')
PL/SQL procedure successfully completed.
SQL> SELECT NAME, TYPE
2 FROM T_DOUBLE_IND A
3 WHERE NAME = 'T_DOUBLE_IND'
4 AND TYPE = 'TABLE';
Execution Plan
----------------------------------------------------------
Plan hash value: 3922347956
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 240 | 7680 | 1 (0)| 00:00:01 |
|* 1 | INDEX RANGE SCAN| IND_DOU_NAM_TYP | 240 | 7680 | 1 (0)| 00:00:01 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("NAME"='T_DOUBLE_IND' AND "TYPE"='TABLE')
SQL> SELECT ID, NAME, TYPE
2 FROM T_DOUBLE_IND A
3 WHERE NAME = 'T_DOUBLE_IND'
4 AND TYPE = 'TABLE';
Execution Plan
----------------------------------------------------------
Plan hash value: 1808990274
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 240 | 8880 | 149 (0)| 00:00:03 |
| 1 | TABLE ACCESS BY INDEX ROWID| T_DOUBLE_IND | 240 | 8880 | 149 (0)| 00:00:03 |
|*2 | INDEX RANGE SCAN | IND_DOU_NAM_TYP | 240 | | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("NAME"='T_DOUBLE_IND' AND "TYPE"='TABLE')
简单总结一下,根据这三篇文章介绍的同时访问两个以上索引的三种情况而言,AND-EQUAL执行计划在Oracle已经不在推荐,连AND-EQUAL这个HINT在文档中也不再介绍,只是为了后向兼容性而继续保留。原因已经介绍过,Oracle无法通过统计信息来判断这种执行路径的代价。而INDEX HASH JOIN则由于限制条件的问题很难广泛使用。唯一对于多个单列索引而言的优势在于BITMAP AND/OR操作,而一般这种执行计划在数据仓库类型的查询中更为常见。因此,对于要求快速得到查询结果的OLTP系统中,复合索引应该是更为合理的选择,不要希望通过建立多个单列索引,来完全代替复合索引的创建。
最后,什么索引最适合你的系统仍然要具体情况具体分析,实践才是检验索引的唯一真理。
总机:(010)-88589926,88589826,88587026 QQ讨论群:243729577 182441349 邮箱:cuug_bj@cuug.com
通信地址:北京市海淀区紫竹院路98号北京化工大学科技园609室(CUUG)邮政编码:100089
中国UNIX用户协会 Copyright 2010 ALL Rights Reserved 北京神脑资讯技术有限公司
京ICP备11008061号 京公网安备110108006275号