Written by: Eric Rouach, Madeira Data Solutions - May 2023
Create External Table As Select or "CETAS" has finally become available on SQL Server with the release of the 2022 version.
After a short setup, we can create various formats files containing any query's result set. The created file/s must be kept on an Azure storage solution i.e. Azure Blob Storage.
The process also creates an external table reflecting the updated file's content.
Pre-requisites:
-An instance of SQL Server 2022
-An Azure storage solution i.e. Azure Blob Storage
-Basic understanding of external tables, external data sources, cross-database queries in Azure SQL Database
-In case you're new to external tables, I recommend reading this article:
The following demo includes 2 parts:
1) Setup steps
2) An example stored procedure that creates a .parquet file with the content of a query from the AdventureWorks2019
database. The file is created in a folder located at an Azure Blob Storage container.
An external table reflecting this file is also created and, as we might not need to keep it,
the stored procedure also deletes any external table older than 15 minutes.
Useful documentation links:
1) One-time set up:
In case you haven't selected Polybase while installing SQL Server 2022, go to SQL Server Installation Center and add the missing feature.
Make sure Polybase is installed:
In case the above query returned 0, run the script below:
You will also need to enable the "allow polybase export" configuration option:
The next step is the creation of 4 external objects:
master key
database scoped credential
external data source
external file format
Create MASTER KEY
Create DATABASE SCOPED CREDENTIAL:
*when using a blob storage as a target location, use SAS as the identity
*before generating the SAS, make sure to grant the READ, WRITE and CREATE permissions via Azure Portal:
Create EXTERNAL DATA SOURCE
Create EXTERNAL FILE FORMAT
Check objects creation:
2) Example Stored Procedure:
USE AdventureWorks2019;
GO
CREATE OR ALTER PROC dbo.CreateDailyParquetFile
AS
BEGIN
BEGIN TRY
BEGIN TRAN
--Create the parquet file and external table
DECLARE @TableName NVARCHAR(255),
@Location NVARCHAR(255),
@SqlCmd NVARCHAR(MAX)
SET @TableName = 'ext_sales_'+FORMAT(GETDATE(),'yyyyMMddHHmmss')
--LOCATION must point to a folder and have a trailing /
SET @Location = '/sales_'+FORMAT(GETDATE(),'yyyyMMddHHmmss')
SET @SqlCmd =
N'
CREATE EXTERNAL TABLE '+@TableName+'
WITH (
LOCATION = '+''''+@Location+''''+','+'
DATA_SOURCE = Test1Container1,
FILE_FORMAT = ParquetFileFormat
)
AS
SELECT
TOP 1000 *
FROM
AdventureWorks2019.[Sales].[SalesOrderDetail]
GO
'
EXEC sp_executesql @SqlCmd
--Delete "old" external tables
DECLARE @TableToDelete VARCHAR(50)
DECLARE @DeleteCmd NVARCHAR(MAX)
DECLARE DeletionCursor CURSOR LOCAL FAST_FORWARD
FOR
SELECT
name
FROM
sys.external_tables
WHERE
create_date < DATEADD(MINUTE,-15,GETDATE())
OPEN DeletionCursor
FETCH NEXT FROM DeletionCursor INTO @TableToDelete
WHILE @@FETCH_STATUS = 0
BEGIN
SET @DeleteCmd = N'DROP EXTERNAL TABLE '+@TableToDelete
EXEC sp_executesql @DeleteCmd
FETCH NEXT FROM DeletionCursor INTO @TableToDelete
END
CLOSE DeletionCursor
DEALLOCATE DeletionCursor
COMMIT
END TRY
BEGIN CATCH
INSERT INTO dbo.ErrorLog
(
[ErrorTime],
[UserName],
[ErrorNumber],
[ErrorSeverity],
[ErrorState],
[ErrorProcedure],
[ErrorLine],
[ErrorMessage]
)
SELECT
GETDATE(),
SUSER_SNAME(),
ERROR_NUMBER(),
ERROR_SEVERITY(),
ERROR_STATE(),
ERROR_PROCEDURE(),
ERROR_LINE(),
ERROR_MESSAGE()
PRINT 'The error was: '+ERROR_MESSAGE()
IF @@TRANCOUNT > 0
ROLLBACK
END CATCH
END;
GO
After executing the procedure, 3 things will happen:
A .parquet file will be created in your Azure storage
An external table reflecting the .parquet file's content will be created in the database
Any external table older than 15 minutes will be deleted from the database's External Tables folder.
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
Wouldn't the ROLLBACK after logging the error roll it back?