Partitioning on SQL Server

Madeira Team
Dec 14, 2009
15 min read

Overview

Partition views have been part of SQL Server since SQL Server 7.0, but the maintenance and benefits they provided were limited. In SQL server 2005, Microsoft improve the partitioning and partition management, up to the point of standing the competition against databases of other firms (for example, ORACLE had partitions since version 8i).

So what does partitioning mean? Partitioning is division of one logical database object (table, index) into a number of physical units by using a column as the partition range. But even though the object is physically partitioned into number of units, it still can be used as one logical object.

In this article we will review why and when we should use partitioning, how to create a partitioned table from scratch or how to create partitions on an existing table, how to maintain partitions and which system objects are useful for monitoring or working with partitions.

Why should we use partitions?

Partitions should be used in order to improve performance and to increase availability of data. We can operate on a single partition even with performance critical operation, such as reindexing, without affecting the others. During updating or changing data that affects the entire table, as soon as the partition is available (and all data in it has changed), the data in that partition is available for querying. The biggest goal for partitions is to improve scalability and manageability of large tables or tables that have different access patterns. It’s easy to delete large amount of data from the table when you have partitions, since it’s possible to drop a partition in one second, regardless of how much data in it, unlike in regular table where the engine has to find all the data with WHERE claus0065. We can increase availability by using partitions. If we have each partition in a different filegroup, then during the restoring process (from filegroup backup), a partition that was already restored is available for use (only if its indexes are in the same filesgroup).

When we should use partitions?

We can use partitions:

When we have a large table and we want to divide it into smaller parts that can be saved in a number of filegroups located in number of disks and increase I/O performance.
When we have a system with multiple CPUs SQL engines which executes parallel operations on the partitioned table, in order to achieve better performance.
When we have application log tables, where the data is saved for a month and deleted afterwards. By partitioning this table with a daily range, we can maintain it with a job that will split the next range and delete the oldest partition with no time, while using other partitions, which will be very helpful for users and performance.
When we have a table with data that should be archived once in a while, we can take the oldest partition from that table and save it as the newest partition in the archive table quickly.
When we know the pattern of our queries on large table we can use partitions to achieve better performance.

Terminology

Before we’ll learn how to create and use partitions, let’s review some partition terms.

Partition view – In SQL Server 7.0 there were no partitions, so if we wanted to divide a table into a couple of smaller units – we had to create number separate tables to achieve this goal. So if we wanted to view data from a few tables (if the range of each table was one month and we wanted to check data of the last quarter) then we could use partitioned views. Partition views were created as the union of the tables that were logically connected. The views were combined with constraints to allow the query optimizer to remove irrelevant tables (tables that were not included in WHERE clause) from the query plan and significantly reduce the overall plan cost when partition view accessed multiple tables. In SQL Server 2000, we could also use partition views for data modification statements. SQL Server can direct modifications to the appropriate base table through the view because of the data structure that is like views structure.
Partition function– defines the algorithm to be used in order to direct records. In partition function we define how we want our table to divide, where the partition boundary would be. For example, we want a table to be partitioned by a date column and we want every partition to include records of one month – we define it in partition function.
Partition function range – is the boundary for the partition. There can be left and right ranges. In the left range, the first value will be an upper boundary in the first partition. In right range the first value would be the lower boundary of the second partition. For example, if we have a month partition with a right range defined from 1.1.10, than 31.12.09 and earlier will go to the first partition and 1.1.10 will be the lower boundary of the second partition. If we have a month partition with a left range defined from 1.1.10, than 1.1.10 and earlier will go to the first partition.
Partition key – a single column on which the table is partitioned by.
Partition scheme – maps the object for one or more filegroups. To determine which data goes to which physical location, the partition scheme uses the partition function.
Partition Table – a table designed by using the partition function and the partition scheme so that all rows that have the same partitioning key are placed in a specific location(defined by partition scheme using the boundaries that were defined by partition function).
Index partitioning – is created by using the partition function and scheme; Increases performance if the table is also partitioned and the index’s partitions are located in the same location as the table partitions. An index does not have to participate in the same named partition function to be aligned with its base table. However, the partition function of the index and the base table must be essentially the same (the arguments of the partition functions having the same data type; they define the same number of partitions; they define the same boundary values for partitions). In order to create a partition index, the table does have to be partitioned.
Split – adds new range to the partition function (next month or next year). Before using ‘split’ we have to check that we have enough partitions used by the partition scheme and if we have less partitions in the scheme we have to alter the partition scheme to add more partitions for new ranges. Split is very useful in a dynamic environment that changes (grows) constantly.
Switch – switches out the partition from the table to another table (secondary table must contain the same columns as a partitioned table). To enable partition switching, all indexes on the partitioned table must be aligned. Switch is very useful when we want to move a range to an archive or to delete it. This command will take a moment to switch out the data instead of searching for it with where clause.
Merge – merges ranges of partitions. After moving partitions out, we want to un-allocate the filegroup that was used by them so we merge all ranges that are unused, so that they can maintain the partitions easily. We also change the partition boundaries. For example, if we defined the partition function range to be a day and after a couple of month, we saw that there is a small amount of data in each partition, and performance would not change if we change the day partitions to week partitions and it would be easier for to maintain 53 partitions per year instead of 356.

How do we create a partitioned table?

Before creating a partitioned table we should decide:

If the table really should be partitioned.

If we decided that it should, we have to determine which column should be the partitioning key and how many partitions we need for now (we can always add more).

If we need to add more filegroups or files to increase performance. If we decided that we need more filegroups, we should add now using:

PgSQL

ALTER DATABASE GlobalSales ADD FILEGROUP PartitionFG ALTER DATABASE GlobalSales ADD FILEGROUP PartitionFG2 ALTER DATABASE GlobalSales ADD FILE ( NAME = N'GlobalSales2' , FILENAME = N'C:GlobalSales_FilesGlobalSales_P.ndf' ) TO FILEGROUP PartitionFG ALTER DATABASE GlobalSales ADD FILE ( NAME = N'GlobalSales3' , FILENAME = N'C:GlobalSales_FilesGlobalSales_P2.ndf' ) TO FILEGROUP PartitionFG2

ALTER DATABASE

GlobalSales

ADD FILEGROUP PartitionFG

ALTER DATABASE

GlobalSales

ADD FILEGROUP PartitionFG2

ALTER DATABASE

GlobalSales

ADD FILE

(

NAME = N'GlobalSales2' ,

FILENAME = N'C:GlobalSales_FilesGlobalSales_P.ndf'

)

TO FILEGROUP PartitionFG

ALTER DATABASE

GlobalSales

ADD FILE

(

NAME = N'GlobalSales3' ,

FILENAME = N'C:GlobalSales_FilesGlobalSales_P2.ndf'

)

TO FILEGROUP PartitionFG2

After adding filegroups and files the database will look like this:

Now we have to choose a partitioning key. If our table is Sales we should choose the DateOfSale column as a key, because new data will always go to new partitions and when we select from the table we usually use date range and data that is older than a year goes to the archive. So we’ve chosen DateOfSale column as the partitioning key and the partition range will be a month and we will create 6 partitions for now (and add more by using split).

Now we can create the partition function:

PgSQL

CREATE PARTITION FUNCTION pfMonthlyPartition (datetime) AS RANGE RIGHT FOR VALUES ( '1 jan 2009', '1 feb 2009', '1 mar 2009', '1 apr 2009', '1 may 2009', '1 jun 2009');

CREATE PARTITION FUNCTION pfMonthlyPartition (datetime)

AS RANGE RIGHT FOR VALUES ( '1 jan 2009',

'1 feb 2009',

'1 mar 2009',

'1 apr 2009',

'1 may 2009',

'1 jun 2009');

We’ve chosen the right range for values so that all sales that are earlier than 1.1.09 would be saved in first partition. Second partition will include January, third – February… seventh – June. So indeed, we have seven partitions for six month, this happens because first partition saves the data that is older than the range and the last partition saves data of the last range and after it.

Now when we know that we have seven partitions, we should remember this while we creating the partition scheme.

PgSQL

CREATE PARTITION SCHEME SalesPartitionScheme AS PARTITION pfMonthlyPartition TO ( PartitionFG, PartitionFG, PartitionFG, PartitionFG2, PartitionFG2, PartitionFG2, PartitionFG2);

CREATE PARTITION SCHEME SalesPartitionScheme

AS PARTITION pfMonthlyPartition

TO ( PartitionFG,

PartitionFG,

PartitionFG2,

PartitionFG2);

Now when we’ve created the partition function and scheme we can create the Sales table on the scheme using DateOfSale as the key.

PgSQL

CREATE TABLE dbo.[Sales]( [SalesOrderID] [int] NOT NULL, [CarrierTrackingId] [int] NULL, [ProductID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL , [DateOfSale] [datetime] NOT NULL ) ON [SalesPartitionScheme](DateOfSale)

CREATE TABLE dbo.[Sales](

[SalesOrderID] [int] NOT NULL,

[CarrierTrackingId] [int] NULL,

[ProductID] [int] NOT NULL,

[UnitPrice] [money] NOT NULL,

[UnitPriceDiscount] [money] NOT NULL ,

[DateOfSale] [datetime] NOT NULL

) ON [SalesPartitionScheme](DateOfSale)

Now we have partitioned table so we can start using it.

We can check how many partitions do we have and how many rows in each partition by using:

PgSQL

SELECT * FROM sys.partitions WHERE [object_id] = object_id('Sales')

SELECT * FROM sys.partitions

WHERE [object_id] = object_id('Sales')

We can also see the partition values by using:

PgSQL

SELECT * FROM sys.partition_range_values

We can also create indexes on this table, but we should remember that if we want the index to be partitioned, we should include the DateOfSales column in the index and create the index on the partition scheme. If you create a clustered index, you should not mention filegroup in the CREATE clause, since if you do mention the filegroup, the table will turn to non partitioned, because it physically will be on that filegroup and not partitioned as the scheme.

How to create partitions on an existing table?

If we have a table that we want to partition, we have to decide few things:

Let’s assume that we have database called LocalSales:

We have a table on the Primary filegroup that we want to partition. There are 10,000 rows in the table. For now, table has only one partition that includes all the data:

PgSQL

SELECT * FROM sys.partitions WHERE [object_id] = object_id('Sales')

SELECT * FROM sys.partitions

WHERE [object_id] = object_id('Sales')

If there are indexes on the table, each index will be one row in the partitions table:

PgSQL

SELECT * FROM sys.partitions WHERE [object_id] = object_id('Sales')

SELECT * FROM sys.partitions

WHERE [object_id] = object_id('Sales')

Now we have to decide what column would be the partition key, how many partitions we need and if we need more filegroups for those partitions.

We choose the partition key for the Sales table to be DayOfSale, because there are many selected statements using date in WHERE clause and after a year we remove the oldest month. Let’s say we decided to create a month partitions because it easier to maintain. Now we have to decide if we need more filegroups or should we leave it on Primary. If we have available disks, it is better to locate the filegroups on separate disks. Also in order to increase I/O performance, it is better to add new filegroups with new files. We decided to add 4 filegroups that would include yearly quarters:

PgSQL

ALTER DATABASE LocalSales ADD FILEGROUP PartitionQ1 ALTER DATABASE LocalSales ADD FILEGROUP PartitionQ2 ALTER DATABASE LocalSales ADD FILEGROUP PartitionQ3 ALTER DATABASE LocalSales ADD FILEGROUP PartitionQ4 ALTER DATABASE LocalSales ADD FILE ( NAME = N'LocalSalesQ1' , FILENAME = N'C:LocalSales_FilesLocalSales_Q1.ndf' ) TO FILEGROUP PartitionQ1 ALTER DATABASE LocalSales ADD FILE ( NAME = N'LocalSalesQ2' , FILENAME = N'C:LocalSales_FilesLocalSales_Q2.ndf' ) TO FILEGROUP PartitionQ2 ALTER DATABASE LocalSales ADD FILE ( NAME = N'LocalSalesQ3' , FILENAME = N'C:LocalSales_FilesLocalSales_Q3.ndf' ) TO FILEGROUP PartitionQ3 ALTER DATABASE LocalSales ADD FILE ( NAME = N'LocalSalesQ4' , FILENAME = N'C:LocalSales_FilesLocalSales_Q4.ndf' ) TO FILEGROUP PartitionQ4

ALTER DATABASE

LocalSales

ADD FILEGROUP PartitionQ1

ALTER DATABASE

LocalSales

ADD FILEGROUP PartitionQ2

ALTER DATABASE

LocalSales

ADD FILEGROUP PartitionQ3

ALTER DATABASE

LocalSales

ADD FILEGROUP PartitionQ4

ALTER DATABASE

LocalSales

ADD FILE

(

NAME = N'LocalSalesQ1' ,

FILENAME = N'C:LocalSales_FilesLocalSales_Q1.ndf'

)

TO FILEGROUP PartitionQ1

ALTER DATABASE

LocalSales

ADD FILE

(

NAME = N'LocalSalesQ2' ,

FILENAME = N'C:LocalSales_FilesLocalSales_Q2.ndf'

)

TO FILEGROUP PartitionQ2

ALTER DATABASE

LocalSales

ADD FILE

(

NAME = N'LocalSalesQ3' ,

FILENAME = N'C:LocalSales_FilesLocalSales_Q3.ndf'

)

TO FILEGROUP PartitionQ3

ALTER DATABASE

LocalSales

ADD FILE

(

NAME = N'LocalSalesQ4' ,

FILENAME = N'C:LocalSales_FilesLocalSales_Q4.ndf'

)

TO FILEGROUP PartitionQ4

Now the database looks like this:

We are ready to create the partition function and the partition scheme for the table. We will choose a right range for the function because we want the data that is earlier than 1.1.09 to be in the first partition and January in second and so on.

PgSQL

CREATE PARTITION FUNCTION pfMonthlyPartition (datetime) AS RANGE RIGHT FOR VALUES ( '1 jan 2009', '1 feb 2009', '1 mar 2009', '1 apr 2009', '1 may 2009', '1 jun 2009', '1 jul 2009', '1 aug 2009', '1 sep 2009', '1 oct 2009', '1 nov 2009', '1 dec 2009');

CREATE PARTITION FUNCTION pfMonthlyPartition (datetime)

AS RANGE RIGHT FOR VALUES ( '1 jan 2009',

'1 feb 2009',

'1 mar 2009',

'1 apr 2009',

'1 may 2009',

'1 jun 2009',

'1 jul 2009',

'1 aug 2009',

'1 sep 2009',

'1 oct 2009',

'1 nov 2009',

'1 dec 2009');

We have to add 13 partitions to the scheme, because we need the first partition for the data that is earlier than 1.1.09.

PgSQL

CREATE PARTITION SCHEME SalesPartitionScheme AS PARTITION pfMonthlyPartition TO ( PartitionQ, PartitionQ, PartitionQ, PartitionQ, PartitionQ2, PartitionQ2, PartitionQ2, PartitionQ3, PartitionQ3, PartitionQ3, PartitionQ4, PartitionQ4, PartitionQ4);

CREATE PARTITION SCHEME SalesPartitionScheme

AS PARTITION pfMonthlyPartition

TO ( PartitionQ,

PartitionQ,

PartitionQ2,

PartitionQ3,

PartitionQ4,

PartitionQ4);

Now we have the partition function and partition scheme:

PgSQL

SELECT * FROM sys.partition_functions

PgSQL

SELECT * FROM sys.partition_schemes

PgSQL

SELECT * FROM sys.partition_range_values

Before partitioning the table, we should check if we have clustered indexes on it and remove them if we can, or change to non-clustered. If a clustered index is on the partitioning key, we’ll remove it and create a partitioned clustered index, so the table will be partitioned too. If we have only non-clustered indexes, we don’t have to change them to be partitioned, but it’s better for performance and maintenance to replace them with partitioned non-clustered indexes.

PgSQL

SELECT * FROM sys.indexes WHERE [object_id] = object_id('Sales')

SELECT * FROM sys.indexes

WHERE [object_id] = object_id('Sales')

We can see that we have 3 indexes on the table: Clustered primary key on SaleID; non-clustered index on UnitPrice; and non-clustered index on the DateOfSale column.

There are two ways to make a table partitioned:

1) First we will drop the non-clustered index on the DateOfSale, because it will be partition clustered. Then we will drop a clustered index and set the primary key as non-clustered (if we want the index to be on partition scheme, we have to add the partitioning key to be part f the index). Afterwards, we will create a partition index on the DateOfSale. We can also change the index on the UnitPrice column if we like (we have to add partitioning key to the index).

PgSQL

DROP INDEX [IX_Sales_DateOfSale] ON [dbo].[Sales] WITH ( ONLINE = OFF ) GO ALTER TABLE [dbo].[Sales] DROP CONSTRAINT [PK_Sales] GO CREATE CLUSTERED INDEX [IX_Sales_DateOfSale] ON [dbo].[Sales] ( [DateOfSale] ASC )WITH ( SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF) ON [SalesPartitionScheme]([DateOfSale]) GO ALTER TABLE [dbo].[Sales] ADD CONSTRAINT [PK_Sales] PRIMARY KEY NONCLUSTERED ( [SalesOrderID] ASC, [DateOfSale] desc )WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON SalesPartitionScheme([DateOfSale])

DROP INDEX [IX_Sales_DateOfSale] ON [dbo].[Sales] WITH ( ONLINE = OFF )

ALTER TABLE [dbo].[Sales] DROP CONSTRAINT [PK_Sales]

CREATE CLUSTERED INDEX [IX_Sales_DateOfSale] ON [dbo].[Sales]

(

[DateOfSale] ASC

)WITH ( SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = OFF)

ON [SalesPartitionScheme]([DateOfSale])

ALTER TABLE [dbo].[Sales] ADD CONSTRAINT [PK_Sales] PRIMARY KEY NONCLUSTERED

(

[SalesOrderID] ASC,

[DateOfSale] desc

)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON SalesPartitionScheme([DateOfSale])

Let’s look at what we got:

PgSQL

SELECT * FROM sys.partitions WHERE [object_id] = object_id('Sales')

SELECT * FROM sys.partitions

WHERE [object_id] = object_id('Sales')

We can see that now we have 13 partitions for clustered indexes, 1 partition for the non-clustered index on UnitPrice, and 13 partitions on non-clustered primary key.

2) We can only change the primary key to be non-clustered and move the data to partition scheme. We don’t have to create clustered indexes on the table. But we should drop the non-clustered index on the DateOfSale since it isn’t useful any more (if it’s not partitioned). We can create it as a partitioned non-clustered index.

PgSQL

ALTER TABLE [dbo].[Sales] DROP CONSTRAINT [PK_Sales] WITH (MOVE TO SalesPartitionScheme([DateOfSale]) ) GO

ALTER TABLE [dbo].[Sales] DROP CONSTRAINT [PK_Sales] WITH (MOVE TO SalesPartitionScheme([DateOfSale]) )

Now we can see that the table is partitioned, there are 13 partitions of the table, 2 non-partitioned non-clustered indexes:

PgSQL

SELECT * FROM sys.partitions WHERE [object_id] = object_id('Sales')

SELECT * FROM sys.partitions

WHERE [object_id] = object_id('Sales')

We can add a primary key and drop the non-clustered index on DateOfSale:

PgSQL

ALTER TABLE [dbo].[Sales] ADD CONSTRAINT [PK_Sales] PRIMARY KEY NONCLUSTERED ( [SalesOrderID] ASC, [DateOfSale] desc )WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON SalesPartitionScheme([DateOfSale]) GO DROP INDEX [IX_Sales_DateOfSale] ON [dbo].[Sales] WITH ( ONLINE = OFF )

ALTER TABLE [dbo].[Sales] ADD CONSTRAINT [PK_Sales] PRIMARY KEY NONCLUSTERED

(

[SalesOrderID] ASC,

[DateOfSale] desc

)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON SalesPartitionScheme([DateOfSale])

DROP INDEX [IX_Sales_DateOfSale] ON [dbo].[Sales] WITH ( ONLINE = OFF )

Now there will be 27 partitions: 13 partitons for the table, 13 partitions for the primary non-clustered key and 1 index that is not partitioned. We could add a clustered primary key and the number of partitions would be 14: 13 clustered partitions and 1 non-clustered index.

Partition Maintenance

Once we’ve created the partition table, our work as DBAs doesn’t stop. From now on, we should maintain this partition table. What does it mean?

We should split the partition range to cover new data that will come in July and on. This can be done automatically with the scheduled job.
We should switch out data that is older than a year and merge the released partitions. This can be done automatically with the scheduled job.
We should drop the data that is old enough. This can be done automatically with the scheduled job.
We can also use system tables and views to check the partitions and learn how we can reduce performance by using indexes and benefits of partitions.

Now let’s see how we do it:

To split a partition range we also have to add partitions to the partition scheme. So first we’ll add a filegroup for the use of the next partition.

PgSQL

ALTER PARTITION SCHEME SalesPartitionScheme NEXT USED [PartitionFG]

ALTER PARTITION SCHEME SalesPartitionScheme

NEXT USED [PartitionFG]

Now we can split the partition range using the partition function.

PgSQL

ALTER PARTITION FUNCTION pfMonthlyPartition() SPLIT RANGE ('1 JUL 2009')

ALTER PARTITION FUNCTION pfMonthlyPartition()

SPLIT RANGE ('1 JUL 2009')

To switch out a partition, we have to have a destination table for it. The table should be with same columns as the partitioned table. If we want to drop the partition, we can create a simple table (you have to create the table on the same filegroup with a partitioned table) by using:

PgSQL

SELECT * INTO TempPartition FROM Sales WHERE 1=2

SELECT * INTO TempPartition

FROM Sales

WHERE 1=2

Now we can switch out our partition:

PgSQL

ALTER TABLE Sales SWITCH PARTITION 1 TO TempPartition

ALTER TABLE Sales

SWITCH PARTITION 1 TO TempPartition

Now we can drop the switched out partition:

PgSQL

DROP TABLE Temppartition

If we want to switch out a partition to the archive table, then we will use an alter table clause only. If we have indexes on a partitioned table, we have to have the same indexes on the destination table as well and both of them should be on the same filegroup. If the destination table is partitioned, we have to indicate to which partition we want to switch the data.

PgSQL

ALTER TABLE Sales SWITCH PARTITION 1 TO SalesArchive PARTITION 8

ALTER TABLE Sales

SWITCH PARTITION 1 TO SalesArchive PARTITION 8

We can also use a non-partitioned table as a source, then the destination table has to be partitioned and we have to indicate number of partition in which we want to save the data. We can switch only the entire non-partitioned table to one partition.

After switching out the partition we should merge the empty range in the partition function.

PgSQL

ALTER PARTITION FUNCTION pfMonthlyPartition() MERGE RANGE ('1 JAN 2009')

ALTER PARTITION FUNCTION pfMonthlyPartition()

MERGE RANGE ('1 JAN 2009')

Now we can check partition function values:

PgSQL

SELECT * FROM sys.partition_range_values

There is a very useful system function called $PARTITON that turns the partition number into which a set of partitioning column values, mapped for any specified partition function.

It can be used to get partition number for a set of partitioning column values:

PgSQL

SELECT $PARTITION.pfMonthlyPartition('5 MAR 2009') –- returns 3

It can also be used to return the number of rows in each non-empty partition of a partitioned table or index. But it is better to use sys.partitions (the number of rows is already there)

PgSQL

SELECT $PARTITION.pfMonthlyPartition(DateOfSale) AS Partition, COUNT(*) AS [COUNT] FROM dbo.SALES GROUP BY $PARTITION.pfMonthlyPartition(DateOfSale) ORDER BY Partition