• Madeira Team

Partitioning on SQL Server

Overview

Partition views have been part of SQL Server since SQL Server 7.0, but the maintenance and benefits they provided were limited. In SQL server 2005, Microsoft improve the partitioning and partition management, up to the point of standing the competition against databases of other firms (for example, ORACLE had partitions since version 8i).

So what does partitioning mean? Partitioning is division of one logical database object (table, index) into a number of physical units by using a column as the partition range.  But even though the object is physically partitioned into number of units, it still can be used as one logical object.

In this article we will review why and when we should use partitioning, how to create a partitioned table from scratch or how to create partitions on an existing table, how to maintain partitions and which system objects are useful for monitoring or working with partitions.

Why should we use partitions?

Partitions should be used in order to improve performance and to increase availability of data.  We can operate on a single partition even with performance critical operation, such as reindexing, without affecting the others.  During updating or changing data that affects the entire table, as soon as the partition is available (and all data in it has changed), the data in that partition is available for querying. The biggest goal for partitions is to improve scalability and manageability of large tables or tables that have different access patterns.  It’s easy to delete large amount of data from the table when you have partitions, since it’s possible to drop a partition in one second, regardless of how much data in it, unlike in regular table where the engine has to find all the data with WHERE claus0065. We can increase availability by using partitions. If we have each partition in a different filegroup, then during the restoring process (from filegroup backup), a partition that was already restored is available for use (only if its indexes are in the same filesgroup).

When we should use partitions?

We can use partitions:

  1. When we have a large table and we want to divide it into smaller parts that can be saved in a number of filegroups located in number of disks and increase I/O performance.

  2. When we have a system with multiple CPUs SQL engines which executes parallel operations on the partitioned table, in order to achieve better performance.

  3. When we have application log tables, where the data is saved for a month and deleted afterwards. By partitioning this table with a daily range, we can maintain it with a job that will split the next range and delete the oldest partition with no time, while using other partitions, which will be very helpful for users and performance.

  4. When we have a table with data that should be archived once in a while, we can take the oldest partition from that table and save it as the newest partition in the archive table quickly.

  5. When we know the pattern of our queries on large table we can use partitions to achieve better performance.

Terminology

Before we’ll learn how to create and use partitions, let’s review some partition terms.

  1. Partition view – In SQL Server 7.0 there were no partitions, so if we wanted to divide a table into a couple of smaller units – we had to create number separate tables to achieve this goal. So if we wanted to view data from a few tables (if the range of each table was one month and we wanted to check data of the last quarter) then we could use partitioned views.  Partition views were created as the union of the tables that were logically connected. The views were combined with constraints to allow the query optimizer to remove irrelevant tables (tables that were not included in WHERE clause) from the query plan and significantly reduce the overall plan cost when partition view accessed multiple tables. In SQL Server 2000, we could also use partition views for data modification statements. SQL Server can direct modifications to the appropriate base table through the view because of the data structure that is like views structure.

  2. Partition function– defines the algorithm to be used in order to direct records. In partition function we define how we want our table to divide, where the partition boundary would be. For example, we want a table to be partitioned by a date column and we want every partition to include records of one month – we define it in partition function.

  3. Partition function range – is the boundary for the partition. There can be left and right ranges. In the left range, the first value will be an upper boundary in the first partition. In right range the first value would be the lower boundary of the second partition. For example, if we have a month partition with a right range defined from 1.1.10, than 31.12.09 and earlier will go to the first partition and 1.1.10 will be the lower boundary of the second partition. If we have a month partition with a left range defined from 1.1.10, than 1.1.10 and earlier will go to the first partition.

  4. Partition key – a single column on which the table is partitioned by.

  5. Partition scheme – maps the object for one or more filegroups. To determine which data goes to which physical location, the partition scheme uses the partition function.

  6. Partition Table – a table designed by using the partition function and the partition scheme so that all rows that have the same partitioning key are placed in a specific location(defined by partition scheme using the boundaries that were defined by partition function).

  7. Index partitioning – is created by using the partition function and scheme; Increases performance if the table is also partitioned and the index’s partitions are located in the same location as the table partitions. An index does not have to participate in the same named partition function to be aligned with its base table. However, the partition function of the index and the base table must be essentially the same (the arguments of the partition functions having the same data type; they define the same number of partitions; they define the same boundary values for partitions). In order to create a partition index, the table does have to be partitioned.

  8. Split – adds new range to the partition function (next month or next year). Before using ‘split’ we have to check that we have enough partitions used by the partition scheme and if we have less partitions in the scheme we have to alter the partition scheme to add more partitions for new ranges. Split is very useful in a dynamic environment that changes (grows) constantly.

  9. Switch – switches out the partition from the table to another table (secondary table must contain the same columns as a partitioned table). To enable partition switching, all indexes on the partitioned table must be aligned. Switch is very useful when we want to move a range to an archive or to delete it. This command will take a moment to switch out the data instead of searching for it with where clause.

  10. Merge – merges ranges of partitions.  After moving partitions out, we want to un-allocate the filegroup that was used by them so we merge all ranges that are unused, so that they can maintain the partitions easily. We also change the partition boundaries. For example, if we defined the partition function range to be a day and after a couple of month, we saw that there is a small amount of data in each partition, and performance would not change if we change the day partitions to week partitions and it would be easier for to maintain 53 partitions per year instead of 356.

How do we create a partitioned table?

Before creating a partitioned table we should decide:

If the table really should be partitioned.

If we decided that it should, we have to determine which column should be the partitioning key and how many partitions we need for now (we can always add more).

If we need to add more filegroups or files to increase performance. If we decided that we need more filegroups, we should add now using:

PgSQL

ALTER DATABASE GlobalSales ADD FILEGROUP PartitionFG ALTER DATABASE GlobalSales ADD FILEGROUP PartitionFG2 ALTER DATABASE GlobalSales ADD FILE ( NAME = N'GlobalSales2' , FILENAME = N'C:GlobalSales_FilesGlobalSales_P.ndf' ) TO FILEGROUP PartitionFG ALTER DATABASE GlobalSales ADD FILE ( NAME = N'GlobalSales3' , FILENAME = N'C:GlobalSales_FilesGlobalSales_P2.ndf' ) TO FILEGROUP PartitionFG2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

ALTER DATABASE

GlobalSales

ADD FILEGROUP PartitionFG

ALTER DATABASE

GlobalSales

ADD FILEGROUP PartitionFG2

ALTER DATABASE

GlobalSales

ADD FILE

(

NAME = N'GlobalSales2' ,

FILENAME = N'C:GlobalSales_FilesGlobalSales_P.ndf'

)

TO FILEGROUP PartitionFG

ALTER DATABASE

GlobalSales

ADD FILE

(

NAME = N'GlobalSales3' ,

FILENAME = N'C:GlobalSales_FilesGlobalSales_P2.ndf'

)

TO FILEGROUP PartitionFG2

After adding filegroups and files the database will look like this:


Partitioning 1

Now we have to choose a partitioning key.  If our table is Sales we should choose the DateOfSale column as a key, because new data will always go to new partitions and when we select from the table we usually use date range and data that is older than a year goes to the archive. So we’ve chosen DateOfSale column as the partitioning key and the partition range will be a month and we will create 6 partitions for now (and add more by using split).

Now we can create the partition function:

PgSQL

CREATE PARTITION FUNCTION pfMonthlyPartition (datetime) AS RANGE RIGHT FOR VALUES ( '1 jan 2009', '1 feb 2009', '1 mar 2009', '1 apr 2009', '1 may 2009', '1 jun 2009');

1

2

3

4

5

6

7

CREATE PARTITION FUNCTION pfMonthlyPartition (datetime)

AS RANGE RIGHT FOR VALUES ( '1 jan 2009',

'1 feb 2009',

'1 mar 2009',

'1 apr 2009',

'1 may 2009',

'1 jun 2009');

We’ve chosen the right range for values so that all sales that are earlier than 1.1.09 would be saved in first partition. Second partition will include January, third – February… seventh – June. So indeed, we have seven partitions for six month, this happens because first partition saves the data that is older than the range and the last partition saves data of the last range and after it.

Now when we know that we have seven partitions, we should remember this while we creating the partition scheme.

PgSQL

CREATE PARTITION SCHEME SalesPartitionScheme AS PARTITION pfMonthlyPartition TO ( PartitionFG, PartitionFG, PartitionFG, PartitionFG2, PartitionFG2, PartitionFG2, PartitionFG2);

1

2

3

4

5

6

7

8

9

CREATE PARTITION SCHEME SalesPartitionScheme

AS PARTITION pfMonthlyPartition

TO ( PartitionFG,

PartitionFG,

PartitionFG,

PartitionFG2,

PartitionFG2,

PartitionFG2,

PartitionFG2);

Now when we’ve created the partition function and scheme we can create the Sales table on the scheme using DateOfSale as the key.