Composite index to evaluate ORDER BY and PREDICATE together
A few weeks back I have written a story regarding the indexing strategy on database table to improve the performance of a query in one of our use case.
To continue on the same journey, today I am looking to summarize another user case experience on the similar line.
While indexing in DBs are the most common way used by the developers, to improve the performance of the query, there are always multiple options of indexes possible for any user case. While all of them may improve the performance, the improvement impact may vary from one to other. So it’s very important to find the index which result into the maximum performance improvement. Just making sure that some index is getting used does not meet the objective, making sure to evaluate the access plan to understand the final cost is important.
While its common understanding that indexing improve the data lookup based on the condition being mentioned in the WHERE clause, that is not the only user case where indexes are used. Indexes are also used for sorting (ORDER BY clause). While individual indexes can be used to evaluate 2 condition by RID scans, such composition may not be possible to group the evaluation of ORDER BY and WHERE clause condition. Taking the below query example
SELECT ID, TYPE, ITEM_NUMBER FROM TABLE1 WHERE TYPE=? AND ITEM_NUMBER=? ORDER BY ID DESC FETCH FIRST ROW ONLY.
In TABLE1, we have millions of rows with ID as primary key and hence had unique indexed by default. TYPE has only 4 unique values and repeat themselves across all the records. ITEM_NUMBER while not unique however has not many repeated values (1 to 5 repeated values).
The immediate solution which comes to anyone mind is to create the index on the ITEM_NUMBER column. But will it really work?
We are using DB2 on Cloud as our DB engine. From that perspective I understand that optimizer generally gives ORDER BY clause more important and first evaluation priority while choosing the INDEXes. With this in mind access plan look like this.
1. Choose ID indexes and sort all the rows in the memory.
2. Evaluate other WHERE condition on the sorted records.
With these steps, and all the rows getting fetched to memory already for sorting, ITEM_NUMBER index will have no use. As number of rows are in millions, bringing them in memory and then applied scan on sorted rows to apply WHERE clause condition has very much costly.
So what’s the solution.
Creating a composite index which can help evaluate both ORDER BY and ITEM_NUMBER condition in one go can change the optimizer decision. While it prioritizes the ORDER BY clause, any index which bring additional value to evaluate another condition alongside, will be selected. So we created the following index.
CREATE INDEX INDX1 on TABLE1 (ID, ITEM_NUMBER)
This has changes the access plan to look like this now.
1. Use INDEX scan over INDX1, sort and apply condition on ITEM_NUMBER in one go.
This access plan is very much faster.
Now let’s says we change the query to
SELECT ID, TYPE, ITEM_NUMBER FROM TABLE1 WHERE TYPE=? AND ITEM_NUMBER=? FETCH FIRST ROW ONLY.
With this now INDX1 will not be used again and we may need to get back to the original standard solution of having an index on ITEM_NUMBER.
So whatever be the solution, last thing to keep in mind that any unused index should be cleaned up as they have impact on the IUD statement.