It’s Time to Get Rid of Those Redundant Indexes

For some reason, which I have never understood, SQL Server allows you to create duplicate indexes on the same object (table or view). You can create as many non-clustered indexes as you like with the exact index keys and included columns as well as the exact index properties. The only difference between the indexes would be the index ID and the index name. This is a very undesirable situation, because there is clearly no benefit from having the same index more than once, but on the other hand there is quite a lot of overhead that each index incurs. The overhead includes the storage consumed by the index, the on-going maintenance during DML statements, the periodical rebuild and/or reorganize operations, and the additional complexity that the optimizer has to deal with when evaluating possible access methods to the relevant table or view.

But the problem is not only with duplicate indexes. There are cases, in which two indexes are not identical, but still one of them is completely redundant, and you only pay the overhead without having any benefit. One simple example is the order of the included columns. If two indexes have the same index keys in the same order, and the same included columns but in a different order, then although the indexes are not identical, they are practically identical for any matter. This is because the order of the included columns has no meaning. Moreover, if one of these indexes included column [a] and [b], and the other index included columns [a], [b] and [c], then the first index would be completely redundant. As long as the second index exists, there’s no point in having the first one as well. The opposite, of course, is not true. If we drop the second index, then a query that needs column [c] wouldn’t be able to retrieve it from the first index.

Another less obvious example for redundant indexes is when two indexes differ in their index keys. If two indexes contain the same index keys, but the order of keys is different, then none of them is redundant (well, it might be redundant in some specific scenarios because the queries that would benefit from them don’t justify the overhead they incur, but we’re talking about the general case here). But if one index contains columns [d] and [e] as index keys, and another index contains columns [d], [e] and [f] as index keys (same order), then assuming they have the same included columns, the first index is redundant, because any query that filters by [d] and [e] or even by [d] alone can use the second index.

I can continue with more examples, but you get the idea. You also have to consider index types – clustered or non-clustered. All the examples mentioned above assume that both indexes are non-clustered. If you find a clustered index to be redundant because there is another non-clustered index that can replace it, then you need to stop right there and rethink about the whole index structure of the relevant table or view. You should also consider the uniqueness of the indexes. If one index is unique and the other is not, then you should examine this case more carefully. And finally, you should also check whether any of the indexes is filtered and compare the filter conditions.

I have seen many databases in my career, and I think that most of them had redundant indexes. It doesn’t necessarily mean that they had poor DBAs. Usually it happens in large databases that are maintained by several people. It’s not easy to keep everything in control all the time, and at some point one DBA will create an index that another DBA has already created 2 years ago. I still don’t understand why Microsoft doesn’t do anything about it. If not about redundant indexes, I at least expect SQL Server to throw an error when you try to create duplicate indexes.

I wrote a query to retrieve all the potential redundant indexes in the current database. An index is considered redundant if there is at least one other index for the same table or view which contains at least the same index keys (in the same order) and at least the same included columns. This query compares only non-clustered indexes, and it does not take into account the uniqueness of the indexes as well as the filter condition of filtered indexes (although it displays this information for each index).

The script is part of our Madeira Toolbox on GitHub, and you can find it here.