I just had an interesting case of performance tuning: a query with multiple predicates on a very large table. Something like this:
SELECT
...
FROM
dbo.LotsOfEvents
WHERE
EventType BETWEEN 1 AND 3
AND
OperatorId = 9876;
1
2
3
4
5
6
7
8
SELECT
...
FROM
dbo.LotsOfEvents
WHERE
EventType BETWEEN 1 AND 3
AND
OperatorId = 9876;
There was a non-clustered index on EventType and OperatorId (in that order).
The number of rows returned by the query was 12, and I would expect a very efficient index seek in this case.
There was indeed an index seek, but the number of logical reads was huge, and it was practically a very large scan.
The reason is that the first index key was “EventType” on which the predicate was BETWEEN.
So the optimizer performed an seek predicate on this key and scanned the entire range of rows between 1 and 3 (approximately 300M rows).
Then it performed a residual predicate on the “OperatorId” column, which resulted in just 12 rows.
One option is to change the order of the keys in the index, but doing that on such a large table has a lot of implications.
Instead, I modified the query to use IN instead of BETWEEN, like this:
SELECT
...
FROM
dbo.LotsOfEvents
WHERE
EventType IN (1,2,3)
AND
OperatorId = 9876;
1
2
3
4
5
6
7
8
SELECT
...
FROM
dbo.LotsOfEvents
WHERE
EventType IN (1,2,3)
AND
OperatorId = 9876;
Now the optimizer performed 3 seek operations, each one with a very efficient seek predicate on both “EventType” and “OperatorId”.
This small change in the query resulted in 9 logical reads instead of 300M, and in almost zero execution time instead of an application timeout.
This doesn’t mean that IN is always better then BETWEEN.
But it was certainly better in this case.
Comments