In Azure Synapse Analytics you can choose between two SQL compute services - Dedicated SQL Pool and Serverless SQL Pool. You have the flexibility to work with only one of them or with both. In this article I will explain what a serverless SQL pool is all about, and why you should consider using it.
Unlike regular SQL databases, where the service includes both the compute and storage resources, a serverless SQL pool is a compute service only. There is no storage in the service itself. The data itself resides in external storage services, usually in files in the data lake. The serverless SQL pool provides you an endpoint to which you can connect and execute T-SQL queries against the external data.
If it's the first time you hear about this model, then it might sound weird. What's the point of using a SQL service, if I can't store my data in tables, like any regular SQL database? Well, there are quite a few advantages to this approach.
Why Serverless SQL Pool?
First of all, it's serverless. This means that you don't need to deal with setting up the infrastructure or scaling it. In fact, you don't need to do anything. When you set up a new Synapse workspace, a built-in serverless SQL pool endpoint is already created for you, and you can start using it right away with familiar tools, such as SQL Server Management Studio and Power BI.
In addition, you only pay for the queries that you execute. If you don't use the service, you don't pay for anything. This is a true pay-per-use billing model. Even if you use the service intensively, you will probably still pay a lot less compared to a regular SQL database or a dedicated SQL pool.
Another advantage of serverless SQL pool is the agility of querying and exploring the data without the need to develop and maintain complex ETL processes. As soon as your data lands in files in the data lake, you can query it right away without having to transform it and load it into tables in the data warehouse.
So when should you consider using a serverless SQL pool?
Serverless SQL Pool Use Cases
First, it is a great solution for data exploration. If you are a data analyst or a data engineer, and you need to understand how the data looks like and search for patterns in the data, then instead of setting up a data warehouse and an ETL process, you can start exploring your data right away with familiar tools and save a lot of time and money.
You do that using the OPENROWSET function, which has been extended to include support for multiple files and folders, multiple file types (such as Parquet and JSON), schema inference, and a lot more. It's quite a powerful function.
Another use case for the serverless SQL pool is a concept called a logical data warehouse. The idea here is to create external metadata objects (external data sources, external file formats, and external tables). These objects are logical representations of the data in the data lake. You can then organize the data in a star-schema, for example, and query the external tables as if they were regular tables. This gives you an interface that is easy to use, while the serverless SQL pool queries the files behind the scenes and takes care of all the complexities associated with that.
This approach can be a great cost-effective solution for a data warehouse. But if you need predictable performance and other features, such as resource management, then you should probably consider a more traditional data warehouse using a dedicated SQL pool.
You can also use a serverless SQL pool to run transformations as part of an ETL process. If you have strong T-SQL skills, then instead of developing transformations in a mapping data flow, for example, you can execute a stored procedure in a serverless SQL pool to run T-SQL queries against the data in the data lake and transform it into the desired schema. The great thing about this option is that you can use the existing endpoint, and you don't need to set up anything.
Want to learn more?
Here is a tutorial to get you started...
Comments