In my previous post, I’ve demonstrated how Foreign keys can help the optimizer make better decisions when creating an execution plan. However, I’ve also mentioned that FKs make inserts and updates run slower. In this short post I’ll show the actual impact the FK has on these actions.
Why does the insert run slower ?
A foreign key is basically a constraint. It limits the data in the referencing column to a list of values found in the referenced table. This makes SQL Server check the data before inserting it, to make sure the constraint is not violated. This check takes time, but how much time ?
To demonstrate I’ve created 2 tables: NameList (populated with 100 Ids and names) and DataTable that will reference the NameList table.
At first I’ve created the DataTable without the FK.
Transact-SQL
-- Create the list table CREATE TABLE dbo.NameList ( Id INT NOT NULL , Name VARCHAR(10) NOT NULL , CONSTRAINT pk_NameId_c PRIMARY KEY CLUSTERED (Id) ); INSERT INTO dbo.NameList(Id, Name) SELECT TOP 100 ROW_NUMBER()OVER(ORDER BY T1.name) , 'Name_'+CAST(ROW_NUMBER()OVER(ORDER BY T1.name) AS VARCHAR(4)) FROM sys.all_columns AS T1 CROSS JOIN sys.all_columns AS T2; GO -- Create data table CREATE TABLE dbo.DataTable ( NameId INT NOT NULL ); GO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Create the list table
CREATE TABLE
dbo.NameList
(
Id INT NOT NULL ,
Name VARCHAR(10) NOT NULL ,
CONSTRAINT pk_NameId_c PRIMARY KEY CLUSTERED (Id)
);
INSERT INTO
dbo.NameList(Id, Name)
SELECT TOP 100
ROW_NUMBER()OVER(ORDER BY T1.name) ,
'Name_'+CAST(ROW_NUMBER()OVER(ORDER BY T1.name) AS VARCHAR(4))
FROM
sys.all_columns AS T1
CROSS JOIN
sys.all_columns AS T2;
GO
-- Create data table
CREATE TABLE
dbo.DataTable
(
NameId INT NOT NULL
);
GO
I’ve also created a third table (RandomData) from which data will be selected into the DataTable. This table contains 1,000,000 records that match the values in the list table.
Transact-SQL
-- Create random data SELECT TOP 1000000 ABS(CHECKSUM(NEWID())%100)+1 AS RandId INTO dbo.RandomData FROM sys.all_columns AS T1 CROSS JOIN sys.all_columns AS T2; GO
1
2
3
4
5
6
7
8
9
10
-- Create random data
SELECT TOP 1000000
ABS(CHECKSUM(NEWID())%100)+1 AS RandId
INTO
dbo.RandomData
FROM
sys.all_columns AS T1
CROSS JOIN
sys.all_columns AS T2;
GO
Inserting data without a FK:
First lets see how long it takes to insert 1,000,000 records into a table without a FK. To test this, I’ve run an insert statement 10 times and measured the AVG run time.
Transact-SQL
-- Insert with out a FK and check time IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'dbo.fk_DataList') AND parent_object_id = OBJECT_ID(N'dbo.DataTable')) BEGIN ALTER TABLE dbo.DataTable DROP CONSTRAINT fk_DataList; END; GO DECLARE @StartTime DATETIME2 , @EndTime DATETIME2 , @i INT = 1, @n INT = 10; TRUNCATE TABLE dbo.DataTable; SET @StartTime = SYSDATETIME(); WHILE @i <= @n BEGIN INSERT INTO dbo.DataTable (NameId) SELECT RandId FROM dbo.RandomData; SET @i+=1; END; SET @EndTime = SYSDATETIME(); SELECT Avg_Insert_Time = CAST(CAST(DATEDIFF(MS,@StartTime,@EndTime) AS DECIMAL)/1000/@n AS DECIMAL(7,3)); GO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Insert with out a FK and check time
IF EXISTS (SELECT * FROM sys.foreign_keys
WHERE object_id = OBJECT_ID(N'dbo.fk_DataList')
AND parent_object_id = OBJECT_ID(N'dbo.DataTable'))
BEGIN
ALTER TABLE dbo.DataTable
DROP CONSTRAINT fk_DataList;
END;
GO
DECLARE
@StartTime DATETIME2 ,
@EndTime DATETIME2 ,
@i INT = 1,
@n INT = 10;
TRUNCATE TABLE dbo.DataTable;
SET @StartTime = SYSDATETIME();
WHILE @i <= @n
BEGIN
INSERT INTO dbo.DataTable (NameId)
SELECT RandId FROM dbo.RandomData;
SET @i+=1;
END;
SET @EndTime = SYSDATETIME();
SELECT Avg_Insert_Time = CAST(CAST(DATEDIFF(MS,@StartTime,@EndTime) AS DECIMAL)/1000/@n AS DECIMAL(7,3));
GO
I ran this test 12 times, and eliminated the best and worst results. The average run time was 4.68 sec.
The execution plan for the insert was very simple.
Inserting with a FK:
I run the same test, only this time I’ve created a FK before.
Transact-SQL
-- Add the FK TRUNCATE TABLE dbo.DataTable; ALTER TABLE dbo.DataTable ADD CONSTRAINT fk_DataList FOREIGN KEY (NameId) REFERENCES dbo.NameList (Id); GO -- Now insert with a FK and check time DECLARE @StartTime DATETIME2 , @EndTime DATETIME2 , @i INT = 1, @n INT = 10; TRUNCATE TABLE dbo.DataTable; SET @StartTime = SYSDATETIME(); WHILE @i <= @n BEGIN INSERT INTO dbo.DataTable (NameId) SELECT RandId FROM dbo.RandomData; SET @i+=1; END; SET @EndTime = SYSDATETIME(); SELECT Avg_Insert_Time = CAST(CAST(DATEDIFF(MS,@StartTime,@EndTime) AS DECIMAL)/1000/@n AS DECIMAL(7,3)); GO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Add the FK
TRUNCATE TABLE dbo.DataTable;
ALTER TABLE dbo.DataTable
ADD CONSTRAINT fk_DataList FOREIGN KEY (NameId)
REFERENCES dbo.NameList (Id);
GO
-- Now insert with a FK and check time
DECLARE
@StartTime DATETIME2 ,
@EndTime DATETIME2 ,
@i INT = 1,
@n INT = 10;
TRUNCATE TABLE dbo.DataTable;
SET @StartTime = SYSDATETIME();
WHILE @i <= @n
BEGIN
INSERT INTO dbo.DataTable (NameId)
SELECT RandId FROM dbo.RandomData;
SET @i+=1;
END;
SET @EndTime = SYSDATETIME();
SELECT Avg_Insert_Time = CAST(CAST(DATEDIFF(MS,@StartTime,@EndTime) AS DECIMAL)/1000/@n AS DECIMAL(7,3));
GO
This time the average run time was 6.30sec. Almost 35% more. I must say I didn’t think the impact is that significant.
The execution plan this time was very different. The data is sorted, and the list table is accessed as well.
Enjoying both worlds:
As we just saw, a FK can have significant impact on performance during inserts. Sometimes it’s a price we are willing to pay, but sometimes our system is overloaded as it is. What can we do to minimize the impact on inserts while still keeping the FK ?
One solution is to drop the FK right before the insert and then rebuild it. This way the insert itself is performed as fast as possible while we still keep our data consistent.
Transact-SQL
-- Dropping the FK before the insert and recrating it after it's finished DECLARE @StartTime DATETIME2 , @EndTime DATETIME2 , @i INT = 1, @n INT = 10; TRUNCATE TABLE dbo.DataTable; SET @StartTime = SYSDATETIME(); WHILE @i <= @n BEGIN ALTER TABLE dbo.DataTable DROP CONSTRAINT fk_DataList; INSERT INTO dbo.DataTable (NameId) SELECT RandId FROM dbo.RandomData; ALTER TABLE dbo.DataTable ADD CONSTRAINT fk_DataList FOREIGN KEY (NameId) REFERENCES dbo.NameList (Id); SET @i+=1; END; SET @EndTime = SYSDATETIME(); SELECT Avg_Insert_Time = CAST(CAST(DATEDIFF(MS,@StartTime,@EndTime) AS DECIMAL)/1000/@n AS DECIMAL(7,3)); GO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Dropping the FK before the insert and recrating it after it's finished
DECLARE
@StartTime DATETIME2 ,
@EndTime DATETIME2 ,
@i INT = 1,
@n INT = 10;
TRUNCATE TABLE dbo.DataTable;
SET @StartTime = SYSDATETIME();
WHILE @i <= @n
BEGIN
ALTER TABLE dbo.DataTable
DROP CONSTRAINT fk_DataList;
INSERT INTO dbo.DataTable (NameId)
SELECT RandId FROM dbo.RandomData;
ALTER TABLE dbo.DataTable
ADD CONSTRAINT fk_DataList FOREIGN KEY (NameId)
REFERENCES dbo.NameList (Id);
SET @i+=1;
END;
SET @EndTime = SYSDATETIME();
SELECT Avg_Insert_Time = CAST(CAST(DATEDIFF(MS,@StartTime,@EndTime) AS DECIMAL)/1000/@n AS DECIMAL(7,3));
GO
This time, the average run time was 5.03 sec, only 2.4% more than without using a FK. The only “danger” is this way there is a possibility that data that violates the FK can be inserted and we won’t be able to rebuild the FK. In order to avoid this we could simply use a try-catch and rollback in case to FK cannot be rebuilt.
Comentários