Pages

Create Date and Time SQL Table

Previously I have designed date and time SQL table for one of my database project. This morning, one of my colleague approached me to discuss the options of populating a date table. I tested a few different methods and make some comparison of their performance. Here are a few options,

In order to track the time consumed for the script, the script of each option listed below is wrapped within SYSDATETIME to record the time difference. Each option is executed 10 times and its average is calculated.

DECLARE @time_begin datetime2(7) = SYSDATETIME();

-- insert the script here --

SELECT DATEDIFF(MILLISECOND, @time_begin, SYSDATETIME());

In order to be assure the results are not affected by cache and log and data file growth, the statements below along with table creation statements are executed before each script is run. In addition, database is placed in simple recovery mode for auto log truncation or with manual CHECKPOINT.

USE TEST;
GO

IF (OBJECT_ID('dbo.tbl1') IS NOT NULL)
BEGIN
 DROP TABLE dbo.tbl1;
END

CREATE TABLE dbo.tbl1 (date_entry date NOT NULL);
GO

DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
CHECKPOINT;

A lot of people usually think of using a while loop to populate a table with list of dates. Starting with the begin date and loop until the last date is populated. Here is an example of using while loop,

SET XACT_ABORT ON;
SET NOCOUNT ON;
DECLARE @start_date date = '1930-12-01',
 @last_date date = '2010-12-31',
 @num  int = 0,
 @rows int;

SET @rows = DATEDIFF(dd, @start_date, @last_date) + 1

BEGIN TRAN;

WHILE (@num < @rows)
BEGIN
 INSERT INTO dbo.tbl1 (date_entry)
 VALUES (DATEADD(dd, @num, @start_date));

 SET @num += 1;

 IF (@num % 10000 = 0)
 BEGIN
  COMMIT;
  BEGIN TRAN;
 END
END

COMMIT;

There are a few important things to point out here. First, it is very important to wrap the insert in transaction (in this example, it uses explicit transaction with BEGIN TRAN and then COMMIT). This allow all inserted rows to be committed in one single transaction instead of committing each inserted row on its implicit transaction which could severely degrade the insert performance. Secondly, notice that the example above uses temporary table. You could use a regular table instead. However, you want to avoid using table variable in this case. When a table variable is used, it doesn't recognize the explicit (e.g. implicit) transaction, and will commit each insert row on its own transaction. There is one optional batch commit here for the @num % 10000. It allows us to decide how many rows we want to commit as one transaction instead of commit the entire rows. In this example, it commits every 10,000 rows. This example without this optional batch commit (line 26-30) works just fine. This while loop option is relatively fast when it is used correctly.

Second option is to use Common Table Expressions (CTE). A recursive CTE could be used to repeatedly execute the insert statement. Here is an example,

WITH sample AS
(
  SELECT CAST('1930-01-01' AS date) AS date_entry
  UNION ALL
  SELECT DATEADD(dd, 1, date_entry)
  FROM sample s
  WHERE DATEADD(dd, 1, date_entry) <= CAST('2010-12-31' AS date)
)
INSERT INTO dbo.tbl1 (date_entry)
SELECT date_entry
FROM sample
OPTION (MAXRECURSION 32767);

The recursive CTE example is pretty straight. Anchor member defined with the start date and repeat until it reach the last date. If you notice, the example uses a query hint for max recursion. This query hint is not necessary if the recursion is below the default limit at 100. However, if the recursion (number of dates) a more than 100, the query hint is required. It is important to understand the maximum recursion is 32767. If your date range is not over 32767 / 365 = 89 years, then this is one option for you. Compare with the while loop option, this option appears to be negligibly slower.

Another way of populating a list of dates without using loop is populate some tables with rows through multiplexing or cross join. Here is the example,

SET NOCOUNT ON;
DECLARE @start_date date = '1930-01-01',
 @last_date date = '2010-12-31',
 @low_date date = '1753-01-01',
 @start_no int,
 @last_no int,
 @rows  int,
 @rows_root int;

SET @start_no = DATEDIFF(dd, @low_date, @start_date);
SET @last_no = DATEDIFF(dd, @low_date, @last_date);
SET @rows = @last_no - @start_no + 1;
SET @rows_root = CONVERT(int, CEILING(SQRT(@rows)));

DECLARE @tbl1 TABLE (number int NOT NULL PRIMARY KEY CLUSTERED);
DECLARE @tbl2 TABLE (number int NOT NULL PRIMARY KEY CLUSTERED);
DECLARE @tbl3 TABLE (number int NOT NULL PRIMARY KEY CLUSTERED);

INSERT INTO @tbl1 (number)
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL 
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL 
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL 
SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 14 UNION ALL 
SELECT 15
ORDER BY 1;

INSERT INTO @tbl2 (number)
SELECT a.number + (16*b.number) + (256*c.number)
FROM 
 @tbl1 a CROSS JOIN 
 @tbl1 b CROSS JOIN 
 @tbl1 c
WHERE a.number + (16*b.number) + (256*c.number) < @rows_root
ORDER BY 1;

INSERT INTO @tbl3 (number)
SELECT a.number + (@rows_root*b.number)
FROM 
 @tbl2 a CROSS JOIN 
 @tbl2 b
WHERE a.number + (@rows_root*b.number) < @rows
ORDER BY 1;

INSERT INTO dbo.tbl1 (date_entry)
SELECT DATEADD(dd, @start_no + t.number, @low_date)
FROM @tbl3 t;

The example above shows a base table is initially populated with some rows and then cross join itself multiple times to create a larger table that consists enough rows to populate the dates. This is my modified part of someone script I found online a while ago. Unfortunately I couldn't find the original designer for the original script (if you know, please comment below so I could give proper credit and reference). If you notice, the designer skillfully breaks down the table population into several parts to optimize the performance. This option shows better performance compare with the other two options.

You could use some of the methods shown above for time table. For example using while loop to create a time table,

USE TEST;
GO
IF (OBJECT_ID('dbo.tbl1') IS NOT NULL)
BEGIN
 DROP TABLE dbo.tbl1;
END

CREATE TABLE dbo.tbl1 (time_entry time(0) NOT NULL);
GO

SET XACT_ABORT ON;
SET NOCOUNT ON;
DECLARE @start_time time(0) = '00:00:00',
 @last_time time(0) = '23:59:59',
 @num  int = 0,
 @rows int;

SET @rows = DATEDIFF(ss, @start_time, @last_time) + 1

BEGIN TRAN;

WHILE (@num < @rows)
BEGIN
 INSERT INTO dbo.tbl1 (time_entry)
 VALUES (DATEADD(ss, @num, @start_time));

 SET @num += 1;

 IF (@num % 10000 = 0)
 BEGIN
  COMMIT;
  BEGIN TRAN;
 END
END

COMMIT;

Since there are 24 * 60 * 60 = 86400 records to populate the time table, we can't use the recursive CTE as it has a limitation of 32767 recursion.

For cross join method, replace the date with time(0) for accuracy to second, and replace with second for any DATEADD and DATEDIFF like DATEADD(ss, 1, start_time).

All options completed less than a second in my testing environment. If you come across different methods, please feel free to comment below.

No comments:

Post a Comment