Thursday, July 7, 2016

Create INDEXES on partitioned tables


Which Index to choose? Global or Local Index for partitioned table !!



Introduction:

We often deal with Indexes while working on Oracle database tables. We do know, there are different type of indexes like B-Tree, Bitmap, Hash-value, etc, etc.. which helps us to improvise the query performance.

However, with respect to Oracle partitioned tables; while creating an Index we have two additional options to choose from namely Local and Global Index.

Local and Global Index type determines the scope or access point of a Oracle Index in a partitioned  table.

In today’s post I am trying to cover the areas to look for before deciding on the Local or Global Index. This article is bases on my understanding and might differ with other.

I welcome all the constructive feedback and suggestions to improvise this article.
Overview:
Local Index:

A Local index is a partitioned index just like a partitioned table and maintains a one to one relationship between the index partitions and the table partitions.

Local Index partitioning helps in easier Index management as each index partition is independent of the other partition in the index and they support higher availability during Index maintenance operations .  Oracle manages the local index partitions automatically, upon creation or deletion of the partitions from the table.




Trying to manually add or drop a partition from Local partitioned index would lead to errors like below
   
alter index EMP_INFO_L_IDX drop partition EMP_INFO_P0;
alter index EMP_INFO_L_IDX drop partition EMP_INFO_P0
*
ERROR at line 1:
ORA-14076: submitted alter index partition/subpartition operation is not valid for local partitioned index





Local Index partitions are the most commonly used type of index partitions
Global Index:

A Global index has a one to many relationship, where a index or index partition can be mapped to multiple table partitions. A Global index can be either partitioned or non partitioned. By default Oracle creates all the indexes as Global Non partitioned index.  In case of partitioned global indexes, each partition in the global index can be mapped to multiple underlying table partitions.

Global index has certain restrictions with respect to index management and maintenance. For instance, dropping a table partition causes the respective global index to become unusable.  It also has certain edge over the local index such as we can define as many partitions as we want for a global index. We can also create a global partitioned index for a non partitioned Oracle table.
A simple excercise to help in choosing the index type for a partitioned table






Lets create a partitioned table for our exercise

CREATE TABLE emp_info
2 (
3 emp_id NUMBER NOT NULL,
4 join_date DATE NOT NULL,
5 email VARCHAR2(100)
6 )
7 PARTITION BY RANGE (join_date)
8 (
9 PARTITION emp_info_p0 VALUES LESS THAN (TO_DATE('01-JAN-2011', 'DD-MON-YYYY')) TABLESPACE APP_DATA,
10 PARTITION emp_info_p1 VALUES LESS THAN (TO_DATE('01-JAN-2012', 'DD-MON-YYYY')) TABLESPACE APP_DATA,
11 PARTITION emp_info_p2 VALUES LESS THAN (TO_DATE('01-JAN-2013', 'DD-MON-YYYY')) TABLESPACE APP_DATA,
12 PARTITION emp_info_p3 VALUES LESS THAN (TO_DATE('01-JAN-2014', 'DD-MON-YYYY')) TABLESPACE APP_DATA,
13 PARTITION emp_info_p4 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')) TABLESPACE APP_DATA
14 );
Table created.

We have created a table EMP_INFO with five partitions to hold records for the year 2010, 2011, 2012, 2013 and 2014 on separate partitions.

declare
2 begin
3 for i in 1..100000
4 loop
5 insert into emp_info values (i,'13-APR-2010','xyz'||i);
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

declare
2 begin
3 for i in 100001..200000
4 loop
5 insert into emp_info values (i,'13-APR-2011','xyz'||i);
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

declare
2 begin
3 for i in 200001..300000
4 loop
5 insert into emp_info values (i,'13-APR-2012','xyz'||i);
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

declare
2 begin
3 for i in 300001..400000
4 loop
5 insert into emp_info values (i,'13-APR-2013','xyz'||i);
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

declare
2 begin
3 for i in 400001..500000
4 loop
5 insert into emp_info values (i,'13-APR-2014','xyz'||i);
6 end loop;
7 end;
8 /

PL/SQL procedure successfully completed.

commit;

Commit complete.
select count(*) from emp_info;

COUNT(*)
----------
500000

select TABLE_NAME,PARTITION_NAME,NUM_ROWS from user_tab_partitions where table_name='EMP_INFO';

TABLE_NAME      PARTITION_NAME     NUM_ROWS
--------------- ---------------- ----------
EMP_INFO        EMP_INFO_P0          100000
EMP_INFO        EMP_INFO_P1          100000
EMP_INFO        EMP_INFO_P2          100000
EMP_INFO        EMP_INFO_P3          100000
EMP_INFO        EMP_INFO_P4          100000





Now, we have 500000 records distributed evenly across the table partitions. We are now good to start with our analysis

Remember: It is always not necessary to create index for a table. We need to create index based on the query that is operating on the data.

Index is best suited, when we are querying a small section of data out of a large data set.

Okay, now coming back to the exercise. Lets query the table with the partition key (join_date) being as the query predicate




   
explain plan for select * from emp_info where join_date='13-APR-2010';

Explained.

@$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 1181328030

---------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |   187K|    13M|   231   (4)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|          |   187K|    13M|   231   (4)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS FULL    | EMP_INFO |   187K|    13M|   231   (4)| 00:00:01 |     1 |     1 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
2 - filter("JOIN_DATE"=TO_DATE(' 2010-04-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

18 rows selected.



In this example, we are querying all the rows from a single partition and as can bee seen in the explain plan; Oracle performed partition elimination based on the partition key (provided as query predicate). If we are always performing query like this, where it selects a large set of data from a particular partition based on the partition key, we do not need an index at all. Oracle will take advantage of partition elimination by means of partition pruning and will only scan the required partitions for the requested set of records.

However, if we just want to query a small set of a records from a particular partition, lets say the records with EMP_ID less than 100  and with JOIN_DATE  as 13-APR-2010 (i.e. from 1st partition)



explain plan for select * from emp_info where emp_id <100 and join_date='13-APR-10';

Explained.

@$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
Plan hash value: 2926039120

---------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     9 |   666 |   267  (17)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE|          |     9 |   666 |   267  (17)| 00:00:01 |   KEY |   KEY |
|*  2 |   TABLE ACCESS FULL    | EMP_INFO |     9 |   666 |   267  (17)| 00:00:01 |   KEY |   KEY |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
2 - filter("EMP_ID"<100 AND "JOIN_DATE"='13-APR-10')

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

18 rows selected.




An index might be useful here as we are just querying a small section of data from a single partition. We can implement a local index on each partition and then access the small set of records from that partition by the help of the index belonging to that partition.



##############################################################################################   
create index EMP_INFO_IDX_LOCAL on emp_info (emp_id) LOCAL;

Index created.

explain plan for select * from emp_info where emp_id <100 and join_date='13-APR-10';

Explained.

@$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1798911101

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                    |    99 |  7326 |     4   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                    |                    |    99 |  7326 |     4   (0)| 00:00:01 |   KEY |   KEY |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| EMP_INFO           |    99 |  7326 |     4   (0)| 00:00:01 |   KEY |   KEY |
|*  3 |    INDEX RANGE SCAN                        | EMP_INFO_IDX_LOCAL |    99 |       |     3   (0)| 00:00:01 |   KEY |   KEY |
---------------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("JOIN_DATE"='13-APR-10' AND "EMP_ID"<100)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

19 rows selected.

########################################################################


So, a Local index would be useful if we are querying a small set of partitions (based on the partition key as query predicate) from the table and then selecting a small set of records from each of those partitions.

However, what if we want to query a set of records and we are not using the partition key (as query predicate) in our query. Something like, query records with EMP_ID between 199910 and 200010.

If we use a Local Index here on the EMP_ID, Oracle is going to scan all the partitions (since we are not using partition key) and then going to use the Local index of those partitions to return the requested set of records.




explain plan for select * from emp_info where emp_id between 199910 and 200010;

Explained.

@$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1213506883

---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                    |   101 |  7474 |     7   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL                       |                    |   101 |  7474 |     7   (0)| 00:00:01 |     1 |     5 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| EMP_INFO           |   101 |  7474 |     7   (0)| 00:00:01 |     1 |     5 |
|*  3 |    INDEX RANGE SCAN                        | EMP_INFO_IDX_LOCAL |   101 |       |     6   (0)| 00:00:01 |     1 |     5 |
---------------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("EMP_ID">=199910 AND "EMP_ID"<=200010)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

19 rows selected.

So, there is a overhead of visiting all the partitions and then scanning individual partitions based on their Local index for the requested set of records.

What we can do instead is, implement a Global index on EMP_ID. In this way Oracle will directly access the record from the table based on the Global Index, irrespective of the table partitioning.




create index EMP_INFO_IDX_GLOBAL on EMP_INFO (EMP_ID) GLOBAL;

Index created.

explain plan for select * from emp_info where emp_id between 199910 and 200010;

Explained.

@$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 425037682

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                     |   101 |  7474 |     4   (0)| 00:00:01 |       |       |
|   1 |  TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| EMP_INFO            |   101 |  7474 |     4   (0)| 00:00:01 | ROWID | ROWID |
|*  2 |   INDEX RANGE SCAN                         | EMP_INFO_IDX_GLOBAL |   101 |       |     3   (0)| 00:00:01 |       |       |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------

2 - access("EMP_ID">=199910 AND "EMP_ID"<=200010)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)

18 rows selected.

So, if our queries are  selecting a small subset of data from the entire table, we can implement a Global index on the query predicate to directly access the set of records from that table

I have used non partitioned Global Index for this example. Just to note that, you also have a option to create partitioned Global Index, where you can partition the Global Index to store the index values on separate index partitions and  can also map a single index partition to single or multiple table partition.

For example, if I want to use partitioned Global index for the partitioned table EMP_INFO and want to store index for the 200000 records in a single partition and the remaining records in a different partition. I can use a Global Partitioned Index as below





   
create index EMP_INFO_IDX_GLOBAL on emp_info (emp_id) global
  2  PARTITION by range (emp_id)
  3  (
  4  PARTITION emp_info_idx_pg1 VALUES LESS THAN (200000)  TABLESPACE APP_DATA,
  5  PARTITION emp_info_idx_pg2 VALUES LESS THAN (MAXVALUE) TABLESPACE APP_DATA
  6  );

Index created.

However, Oracle doesn't manage the partitioned Global index automatically and manual intervention is required for adding or dropping partitions.

We can implement partitioned Global Index to avoid scanning the entire Index.

Following is the example, where I am using a global partitioned Index.






explain plan for select * from emp_info where emp_id<100 and join_date='13-APR-2010';

Explained.

@$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3498262492

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                     |    99 |  2178 |     4   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE                     |                     |    99 |  2178 |     4   (0)| 00:00:01 |     1 |     1 |
|*  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| EMP_INFO            |    99 |  2178 |     4   (0)| 00:00:01 |     1 |     1 |
|*  3 |    INDEX RANGE SCAN                         | EMP_INFO_IDX_GLOBAL |    20 |       |     3   (0)| 00:00:01 |     1 |     1 |
-----------------------------------------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("JOIN_DATE"=TO_DATE(' 2010-04-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
3 - access("EMP_ID"<100)

16 rows selected.

So, when I use a partitioned Global Index, the query optimizer only scans the partition (s) to which the index value belongs to.

A Global Index is particularly more suitable when direct access to the table records are desired without visiting the underlying table partitions.

Conclusion:

Deciding on the Local or Global index is purely based on the Query requirement as well as the on the expected Index management or maintenance.

With respect to Index maintenance, Local Indexes are easy to manage as they are independent of each other and Oracle itself takes care of the partition creation and deletion for Local Indexes at the event of table partition creation or deletion.

On the other hand Global Indexes are bit complex to manage during index maintenance. For instance, dropping a table partition makes the respective Global Index UNUSABLE and we have to rebuild the whole index.




   
alter table EMP_INFO drop partition emp_info_p0;

Table altered.

select index_name,partition_name,status from dba_ind_partitions where index_name='EMP_INFO_IDX_GLOBAL';

INDEX_NAME                PARTITION_NAME   STATUS
------------------------- ---------------- --------
EMP_INFO_IDX_GLOBAL       EMP_INFO_IDX_PG1 UNUSABLE
EMP_INFO_IDX_GLOBAL       EMP_INFO_IDX_PG2 UNUSABLE

We do have a option to use the "UPDATE GLOBAL INDEXES" clause while dropping a table partition to avoid the Global Index become Unusable. However, it would be a time consuming operation to drop the partition in case of large Global Indexes as it involves rebuilding the Global Index during the process of drop partition operation.






alter table EMP_INFO drop partition EMP_INFO_P1 UPDATE GLOBAL INDEXES;

Table altered.

select index_name,partition_name,status from dba_ind_partitions where index_name='EMP_INFO_IDX_GLOBAL';

INDEX_NAME                PARTITION_NAME   STATUS
------------------------- ---------------- --------
EMP_INFO_IDX_GLOBAL       EMP_INFO_IDX_PG1 USABLE
EMP_INFO_IDX_GLOBAL       EMP_INFO_IDX_PG2 USABLE

Therefore, there are a number of factors to evaluate before deciding on the type of Index (Local or Global) to be used for a partitioned table.







No comments:

Post a Comment