• Madeira Team

CASE Study

Sometimes we want to populate a table with random data, mainly for testing purposes. For example, let’s consider the following table:

Transact-SQL

CREATE TABLE Sales.Orders ( Id INT NOT NULL IDENTITY(1,1) , CustomerId INT NOT NULL , DateAndTime DATETIME2(0) NOT NULL , OrderStatus NVARCHAR(50) NULL ); GO

1

2

3

4

5

6

7

8

9

CREATE TABLE

Sales.Orders

(

Id INT NOT NULL IDENTITY(1,1) ,

CustomerId INT NOT NULL ,

DateAndTime DATETIME2(0) NOT NULL ,

OrderStatus NVARCHAR(50) NULL

);

GO

Let’s say we want to insert 100,000 rows with random data. We want to generate random values between 1 and 1,000,000 for the “CustomerId” column, and we want to generate random date & time values in the past 5 years for the “DateAndTime” column. For now, let’s ignore the “OrderStatus” column. Here is how I would do it:

Transact-SQL

INSERT INTO Sales.Orders WITH (TABLOCK) ( CustomerId , DateAndTime ) SELECT TOP (100000) CustomerId = ABS (CHECKSUM (NEWID ())) % 1000000 + 1 , DateAndTime = DATEADD (SECOND , - ABS (CHECKSUM (NEWID ())) % (60 * 60 * 24 * 365 * 5) , SYSDATETIME ()) FROM sys.all_columns AS T1 CROSS JOIN sys.all_columns AS T2; GO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

INSERT INTO

Sales.Orders WITH (TABLOCK)

(

CustomerId ,

DateAndTime

)

SELECT TOP (100000)

CustomerId = ABS (CHECKSUM (NEWID ())) % 1000000 + 1 ,

DateAndTime = DATEADD (SECOND , - ABS (CHECKSUM (NEWID ()))

% (60 * 60 * 24 * 365 * 5) , SYSDATETIME ())

FROM

sys.all_columns AS T1

CROSS JOIN

sys.all_columns AS T2;

GO

Notice a few things:

I’m using the TABLOCK hint in order to use minimal logging. For more information about minimally logged inserts, I recommend this article by Itzik Ben-Gan.

I’m retrieving rows from sys.all_columns cross joined to itself. It really doesn’t matter what table/s I’m using in this case. All I need is to make sure that the data set contains at least 100,000 rows. The TOP clause limits the number of rows returned to 100,000.

I’m using the NEWID function in order to generate random values. If you try to use the RAND function, it will return the same value for the whole recordset, because the RAND function is evaluated only once for the query. If you want a different value for each row (of course you do), then you need to use the NEWID function.

Here is how the table looks like (10 first rows) after the insert:IdCustomerIdDateAndTimeOrderStatus11276932009-09-30 02:18:25NULL21357642008-04-26 07:25:29NULL35163462010-04-17 13:19:24NULL45236682009-05-20 21:12:51NULL52498782010-04-14 10:43:30NULL67201862008-11-20 07:00:06NULL7807922011-03-08 02:39:24NULL8461222011-08-24 10:47:45NULL97370562009-12-15 02:14:43NULL103043842008-03-17 04:51:10NULL

Now, let’s add the “OrderStatus” column to the game. This column should contain one of the following values: “New”, “Open”, “Closed”, “Cancelled” and “Archived”. Generating random values for this column is a bit trickier. We can achieve this by using the CASE statement. But first, a few words about this statement…

The CASE statement has two flavors. One is called “Simple CASE”, and it compares a single expression to a set of simple expressions. The second is called “Searched CASE”, and it evaluates a set of general Boolean expressions.

Here is an example of a simple CASE:

Transact-SQL

CASE CustomerTypeId WHEN 1 THEN N'Regular' WHEN 2 THEN N'VIP' WHEN 3 THEN N'New' WHEN 4 THEN N'Blocked' ELSE N'Unknown' END

1

2

3

4

5

6

7

CASE CustomerTypeId

WHEN 1 THEN N'Regular'

WHEN 2 THEN N'VIP'

WHEN 3 THEN N'New'

WHEN 4 THEN N'Blocked'

ELSE N'Unknown'

END

And here is an example of a searched CASE:

Transact-SQL

CASE WHEN Price <= $100.00 THEN N'Cheap' WHEN Price > $100.00 AND ProductType = 1 THEN N'Medium' WHEN Price > $100.00 AND ProductType = 2 THEN N'Expensive' WHEN Price IS NULL THEN N'Unknown Price' END

1

2

3

4

5

6

CASE

WHEN Price <= $100.00 THEN N'Cheap'

WHEN Price > $100.00 AND ProductType = 1 THEN N'Medium'

WHEN Price > $100.00 AND ProductType = 2 THEN N'Expensive'

WHEN Price IS NULL THEN N'Unknown Price'

END

In both cases, the first condition that evaluates to TRUE determines the result. The ELSE clause is optional. If all conditions evaluate to FALSE and there is no ELSE clause, then the result is NULL.

Now, let’s go back to our example. In order to generate a random value for the “OrderStatus” column, we can generate a random value between 1 and 5, and use the simple CASE statement in order to translate these numbers into text values, like this:

Transact-SQL

TRUNCATE TABLE Sales.Orders; GO INSERT INTO Sales.Orders WITH (TABLOCK) ( CustomerId , DateAndTime , OrderStatus ) SELECT TOP (100000) CustomerId = ABS (CHECKSUM (NEWID ())) % 1000000 + 1 , DateAndTime = DATEADD (SECOND , - ABS (CHECKSUM (NEWID ())) % (60 * 60 * 24 * 365 * 5) , SYSDATETIME ()) , OrderStatus = CASE ABS (CHECKSUM (NEWID ())) % 5 + 1 WHEN 1 THEN N'New' WHEN 2 THEN N'Open' WHEN 3 THEN N'Closed' WHEN 4 THEN N'Cancelled' WHEN 5 THEN N'Archived' END FROM sys.all_columns AS T1 CROSS JOIN sys.all_columns AS T2; GO

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

TRUNCATE TABLE

Sales.Orders;

GO

INSERT INTO

Sales.Orders WITH (TABLOCK)

(