Selecting the correct Index for better database query performance

Manoj K Sardana
4 min readNov 4, 2020

Being an SRE lead, my first responsibility is always to make sure that system is performing at optimal level and remain stable. Performance and stability in some cases are complimentary as if one degrades, it can have cascading impact to degrade the other. Slow performing requests/API can result into system not able to cop up with throughput requirements and at some time later start crashing, similarly an unstable application will keep the performance fluctuating and won’t provide a better user experience.

Though I look forward to create a series of stories to give a view of what all various initiatives we are taking to improve on both the sides, today I am looking to give a brief on a quite smaller though important topic of selecting a right index for a better query performance.

Before we go ahead, I like to first summarize on what an index is in DB. In a single English, indexes are nothing but binary trees which save the value of indexed column in a tree structure so that searching for any value is faster. This also save the pointer to the complete record so that first a value is found, reading of full record can be done without doing a table scan but reading from the pointer directly.

In general, many of us just go ahead and create an index based on the predicate being used in the query and assume that the query will use the index for the search. However, this may not be true in every case. A closer look at the access plan of the query is needed to make sure that indexes are really getting used. today I am looking to list down 3 different cases, where such an assumption either is not true or there is a better way to do it.

1. Using a column function — It’s quite normal to use UCASE feature of DBs to have case insensitive behavior on the query. Let’s consider the following predicate

SELECT * FROM T1 WHERE UCASE(A.email_address)=UCASE(?)

We may create the index of A.email_address to improve the performance of this predicate,

CREATE INDEX inx1 ON T1(email_address)

But this is not going to work. The predicate will search for a value which is in upper case in the index binary tree. If the binary tree is not storing the value in upper case, it’s not going to work. You may even think that, If application stores the value in the table in upper case only, this should match, but even that is not true. From DB perspective there is a function UCASE is being applied on the parameter and column. DB is completely unaware of what your application is doing (like storing the email in upper case only). For DB to apply an index on any predicate, the predicate definition should be exactly matching with the index definition. So the following index should work

CREATE INDEX inx1 on T1(UCASE(email_address))

The catch here is that many of us just forget that it’s not mandated that index can only be created on a column, but you can create index on any value which can be created out of single column by applying any function or manipulation.

2. Using INCLUDE clause — This is another powerful feature most of the DB supports but remain unexploited many time. INCLUDE clause allow the query predicate to be fulfilled completely by index without going to the full records following the index pointer.

Let’s take an example of the following query

Select C1, C2 from T1 where UCASE(email_address) =UCASE(?)

Now the index in our point 1 is surely useful here and make the index scan to filter the predicates. Now let’s assume that we have many records (assume 50 here) with the same email_address, the above query along with the index in point 1 will find 50 records and then follow the pointer to get to the records and select C1 and C2 columns to return back to application. How about the below index

CREATE INDEX inx1 on T1(UCASE(email_address)) include (C1, C2)

Now what is this?

The above definition will save the value of C1 and C2 along with the index column, though binary tree will be formed based on the index column only. Now in this case, there is no need to go to the records to get the column values, but values will be found as part of the index tree itself, further improving the performance. More the number of records matching the criteria, more the performance improvement over the previous index.

Also note that the new index continues to improve the query performance discussed in point 1. INCLUDE clause does not impact its usage for the query where selected columns are different. The only difference will be that DB has to follow the pointer to reach to records to read other columns.

3. Count of records — Though index in general improve the performance of the query if correctly created, in some cases if number of records in the tables are not too high and search result in good percentage of records being selected, index search may cause slowness. As you know indexes are separate object in the form of binary tree where searching is faster due to its structural arrangement compared to a searching sequentially in the table, however after the search it has to follow a pointer which may be pointing to a location where you need to do a different I/O on the disk which has its own bottlenecks. Now if the number of records are too less, time take to search in binary tree and then doing a separate I/o to get to new pointer may be on a higher side compare to a sequential search where searched column and selected columns may be in the same memory area. So its always advisable that indexes should be created only when there is significant number of records in the table.

--

--

Manoj K Sardana

A software Engineer, SRE Lead, DB administrator and performance