Since SQL Server 2005, Microsoft added the nice addition to the common DML statements UPDATE and DELETE by allowing the use of the TOP keyword. Unfortunately for many of us, SQL Server doesn’t support the use of this keyword in conjunction with the ORDER BY clause. So when we do need to update or delete the top x rows in a table based on a specific order, we’re bound to use all kinds of workarounds.
However, many of the commonly used workarounds are not very efficient as they can be.
Here’s an example of a common method to update the top x rows in a table based on a specific order:
Transact-SQL
UPDATE MyTable SET Col1 = Whatever WHERE ID IN (SELECT TOP (100) ID FROM MyTable ORDER BY Col2 DESC)
1
2
UPDATE MyTable SET Col1 = Whatever
WHERE ID IN (SELECT TOP (100) ID FROM MyTable ORDER BY Col2 DESC)
As you can see, this method uses a subquery to get the list of unique IDs of the top 100 rows that we want to update. The problem with this method is that we’re accessing the table twice and performing a JOIN between the two sets of data (even though we didn’t explicitly write “JOIN” in the statement, that’s what SQL Server does in the background).
There are actually several variations to this method. Here’s another example:
Transact-SQL
UPDATE MyTable SET Col1 = Whatever FROM MyTable JOIN (SELECT TOP (100) ID FROM MyTable ORDER BY Col2 DESC) AS ToDelete ON MyTable.ID = ToDelete.ID
1
2
3
4
5
6
7
UPDATE MyTable SET Col1 = Whatever
FROM
MyTable
JOIN
(SELECT TOP (100) ID FROM MyTable ORDER BY Col2 DESC) AS ToDelete
ON
MyTable.ID = ToDelete.ID
This is an explicit JOIN statement with a subquery. This practically results in the same execution plan as the previous method, except that it allows the use of composite primary keys in the JOIN condition.
Another method is to do the same by first putting the results of the subquery in a temporary table and then doing the UPDATE by joining with that temporary table, but that would result in even worse performance due to the extra overhead.
Another method is to use the SET ROWCOUNT statement which would limit any statement to affect a specific maximum number of rows. However, this method is not recommended because unlike the TOP keyword, SQL Server doesn’t take it into consideration when building the execution plan – which could result in an execution plan optimized for many rows of data even though we put a small number in SET ROWCOUNT.
All of the above is relevant to DELETE statements as well.
So, what’s the best way to do this?
I found that the best way to execute a DML statement on a limited number of rows based on specific ordering is actually by updating the subquery itself. This ability in SQL Server is not trivial to all DBAs, which is a shame. Here’s an example of what I mean:
Transact-SQL
UPDATE TheSubQuery SET Col1 = Col1 + Col3 FROM ( SELECT TOP(100) Col1, Col3 FROM MyTable ORDER BY Col2 DESC ) AS TheSubQuery
1
2
3
4
5
6
UPDATE TheSubQuery SET Col1 = Col1 + Col3
FROM (
SELECT TOP(100) Col1, Col3
FROM MyTable
ORDER BY Col2 DESC
) AS TheSubQuery
Note that I need to select in the subquery all the columns used outside the subquery (such as Col1 and Col3). All the data that you’re using outside the subquery must be available (so, suppose you perform a JOIN with another table, the columns used in the join must also be queried in the sub query).
Notice how I’m accessing the table only once (you can see it in the execution plan), and obviously not doing any self-joins as a result of that.
This method works similarly using CTE:
Transact-SQL
WITH TheCTEQuery AS ( SELECT TOP(100) Col1, Col3 FROM MyTable ORDER BY Col2 DESC ) UPDATE TheCTEQuery SET Col1 = Col1 + Col3 FROM TheCTEQuery
1
2
3
4
5
6
7
8
9
WITH TheCTEQuery
AS
(
SELECT TOP(100) Col1, Col3
FROM MyTable
ORDER BY Col2 DESC
)
UPDATE TheCTEQuery SET Col1 = Col1 + Col3
FROM TheCTEQuery
And even by creating a VIEW instead of the subquery, and updating that view:
Transact-SQL
CREATE VIEW MyView AS SELECT TOP(100) Col1, Col3 FROM MyTable ORDER BY Col2 DESC GO UPDATE MyView SET Col1 = Col1 + Col3 FROM MyView
1
2
3
4
5
6
7
8
CREATE VIEW MyView
AS
SELECT TOP(100) Col1, Col3
FROM MyTable
ORDER BY Col2 DESC
GO
UPDATE MyView SET Col1 = Col1 + Col3
FROM MyView
The result and performance of all three methods are practically the same.
Again, all of the above is relevant to DELETE statements as well.
Conclusion
It is indeed a shame that Microsoft added the TOP keyword to DML statements but castrated it this much, since the combination of TOP and ORDER BY seems quite trivial. So until this ability will be built-in, I hope you can make good use of the above workarounds that I’ve showed you.
Kommentare