• Madeira Team

The #1 Enemy of Data Loading in SQL Server

Let’s look at the following script:

Transact-SQL

CREATE TABLE LoadMeFast (Id INT NOT NULL, InsertDate DATETIME NOT NULL, ABunchOfLetters NVARCHAR(100) ) GO SET NOCOUNT ON GO DECLARE @i INT = 1 WHILE @i <= 100000 BEGIN INSERT LoadMeFast SELECT @i, GETDATE(), REPLICATE(N'ABCD', 25) SET @i+=1 END GO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

CREATE TABLE LoadMeFast (Id INT NOT NULL,

InsertDate DATETIME NOT NULL,

ABunchOfLetters NVARCHAR(100)

)

GO

SET NOCOUNT ON

GO

DECLARE @i INT = 1

WHILE @i <= 100000

BEGIN

INSERT LoadMeFast SELECT @i, GETDATE(), REPLICATE(N'ABCD', 25)

SET @i+=1

END

GO

The script runs for more than 20 seconds. What do you think is the problem that causes the script to run that long? Our instinct is to say the reason is we run the script on a row-by-row basis and not as a set-based solution. This is a part of the truth, but not all of it.

Now let’s look at the following script:

Transact-SQL

TRUNCATE TABLE LoadMeFast GO BEGIN TRAN DECLARE @i INT = 1 WHILE @i <= 100000 BEGIN INSERT LoadMeFast SELECT @i, GETDATE(), REPLICATE(N'ABCD', 25) SET @i+=1 END COMMIT

1

2

3

4

5

6

7

8

9

10

11

TRUNCATE TABLE LoadMeFast

GO

BEGIN TRAN

DECLARE @i INT = 1

WHILE @i <= 100000

BEGIN

INSERT LoadMeFast SELECT @i, GETDATE(), REPLICATE(N'ABCD', 25)

SET @i+=1

END

COMMIT

The only difference between the scripts is the fact that the second script is executed inside an explicit transaction. Now let’s see how the Log Flushes/Sec performance counter looks:


Log Flushes Sec

The Real Reason:

Each transaction in SQL Server has to be durable on disk before been considered as committed. In other words, each transaction has to be written to the Transaction Log on disk before the process can move on. When we issue an insert/update/delete statement without an explicit transaction, SQL Server opens an implicit transaction for it. This means that we have to go to disk for each and every Insert statement inside the first While loop. That’s why we see so many log flushes in Performance Monitor for the first run.

In the second script, we handle the transaction on our own. SQL Server goes to disk on commit or when one (or more) of his 60K log blocks fill up. This means that our work with the disk is much more efficient, and we can see that under the second run arrow in Performance Monitor. The #1 enemy of data loading is the Transaction Log. Remember it and act accordingly.

Taking it to the Field

Now that we know how it works, we can take this knowledge to the field. We need to remember that a set-based solution is almost always more efficient performance wise, but it’s not always the case, and that solution is not always easy to write.

Recently, while examining a process at a client site, I spotted a cursor that went over a 200 rows table. Each row contained a few update statements that were executed using Dynamic-SQL, so we got to a total of about 2000 update statements that needed to be executed. Instead of tryi