Contact us

Madeira Data Solutions

Your Data, Our Solutions

Which One is Better? IN or BETWEEN?

Written By: Guy Glantser 09/03/2018

I just had an interesting case of performance tuning: a query with multiple predicates on a very large table. Something like this:

 

There was a non-clustered index on EventType and OperatorId (in that order).

The number of rows returned by the query was 12, and I would expect a very efficient index seek in this case.

There was indeed an index seek, but the number of logical reads was huge, and it was practically a very large scan.

 

The reason is that the first index key was “EventType” on which the predicate was BETWEEN.

So the optimizer performed an seek predicate on this key and scanned the entire range of rows between 1 and 3 (approximately 300M rows).

Then it performed a residual predicate on the “OperatorId” column, which resulted in just 12 rows.

 

One option is to change the order of the keys in the index, but doing that on such a large table has a lot of implications.

Instead, I modified the query to use IN instead of BETWEEN, like this:

 

 

 

Now the optimizer performed 3 seek operations, each one with a very efficient seek predicate on both “EventType” and “OperatorId”.

This small change in the query resulted in 9 logical reads instead of 300M, and in almost zero execution time instead of an application timeout.

 

This doesn’t mean that IN is always better then BETWEEN.

But it was certainly better in this case.

 

7 responses to “Which One is Better? IN or BETWEEN?”

  1. calin oprea says:

    I always show why is better to use >= and <=, not between.

  2. Toby Ovod-Everett says:

    This leads into a discussion of “Index Skip Scans”. It’s a feature that Oracle has, but SQL Server does not. See http://sqlblog.com/blogs/joe_chang/archive/2011/06/13/oracle-index-skip-scan.aspx for a write-up. There was a Connect item for it, but the Connect item is gone. Evidently you’ve found a way to trick SQL Server into doing an Index Skip Scan. I wonder if you could somehow trick it into the same behavior without having to explicitly enumerate the values:
    SELECT

    FROM
    dbo.LotsOfEvents
    WHERE
    EventType IN (SELECT EventType FROM dbo.EventTypes WHERE EventType BETWEEN 1 AND 3)
    AND
    OperatorId = 9876;

    • Toby Ovod-Everett says:

      Actually, re-reading Joe Chang’s post (linked above), I realize that what I suggested is exactly what he did! So, yes, the evidence is you can get skip-scanning behavior without explicit enumeration.

  3. Bob Hovious says:

    Concise and interesting read. Putting this in long-term memory just in case we run into a similar situation at some point. It’s always good to have optional approaches. Thanks for sharing it.

  4. Joe Celko says:

    In other SQL implementations, the IN() predicate can do some special things. If the list is made of constant values (does not include any expressions that have to be computed), then the values can be sorted, any duplicates removed, and then the list can be searched using the usual binary search algorithm. Another product, for mainframes, actually builds a search structure which includes the frequency of each value and arranges the list based on the frequencies.

  5. My guess is this: The query optimizer converts this:

    EventType IN (1,2,3)

    to this:

    (EventType = 1 OR EventType = 2 OR EventType = 3)

    But this:

    EventType BETWEEN 1 AND 3

    Becomes:

    (EventType >= 1 AND EventType <= 3)

    The grouping of 3 distinct values is simpler for the optimizer than a range check, which might be getting a performance boost by an access of table statistics while a range between 1 and 3 could theoretically encompass lots of values depending on the data type of the field (think of a float or real column).

    I'm not an optimizer expert but I've seen cases where statistics make a huge difference in performance when paired against the optimizer. The same query on two servers performed drastically different with the same data set but differing statistics.

  6. Robert Sterbal says:

    Just curious what would the results be for this?

    WHERE
    eventtype >= 1 and eventype <=3

    ?

    is that what the between clause generates to the optimizer?

Leave a Reply

Your email address will not be published. Required fields are marked *