Create Availability group with Automatic / Manual seeding with different paths for data and log file
Read-scale Availability Groups were introduced in SQL Server 2017 as one of AlwaysOn features that provides a read-only copy of your database for reporting and analysis purposes.
With Standard edition, since it is under the limitation of basic availability group, it can be used as a solution for disaster recovery only. More on this misleading functionality here.
My client asked us to plan a migration for their SQL server instances from Windows to Linux servers.
Our plan is to use a cross-platform availability groups to perform the migration. It’s a cluster-less mechanism.
The insights regarding the different seeding modes I will discuss here are not only relevant for Linux or for cross-platform availability group (AG) but also true for any AG created between any two or more replica servers.
When creating an availability group (AG) in SQL server, one of the arguments is called Seeding Mode which can be manual or automatic.
Seeding mode is the mechanism in which the secondary replica(s) will be initialized.
Prior to SQL server 2016, manual seeding was the only way to do this. Meaning you had to use backup, copy, and restore of the database on every secondary replica.
SQL Server 2016 introduces a new feature to initialize a secondary replica - automatic seeding (or direct seeding), which seeds the secondary database files on secondary replica over the network.
Automatic seeding mechanism uses the log stream transport to stream the backup to each secondary replica for each database of the availability group using the configured endpoints.
Note! the seeding of the database is done through the network.
So, if your database is very large (above 5 TB size) or your network bandwidth is short, better reconsidering using the old fashion manual seeding mechanism.
In SQL server 2016, automatic seeding requires the same data and log file path for the database on all servers participating in the AG.
With SQL server 2017, you can use automatic seeding with different file paths as well. BUT! Your database file paths must be aligned with the instance configured default data and log paths.
It is explained and emphasized very clearly and detailly here.
What the above document actually says, is, that whenever your database files are located on the same paths as the SQL instance default data and log file paths (including sub directories), automatic seeding would work.
This feature enables using automatic seeding for cross-platform (Windows-Linux) availability groups, but it can also be leveraged when your replicas are hosted on the same platform.
My Windows primary replica data and log default file paths, as they are configured in the SQL server instance properties are:
My Linux secondary replica data and log default file paths are:
So, as long as your database physical files on the primary replica, in which the AG is firstly created, are located here:
Or even here: U:\SQL_DATA\SubFolderA\MyDatabase.mdf
you're good to go with automatic seeding.
For other databases file locations, you can either use manual seeding mode, with the backup, copy and restore method or, if a SQL service restart is bearable, change the instance default data and log file paths configurations to fit the your database file paths.
Availability group step by step configuration is out of the scope of this blog post, but there is one thing regarding manual seeding mode I would like to point out.
Most Always On availability groups configuration “How To” guides, such as the one I was using for cross-platform AG, or this more general one, provides examples with automatic seeding mode. The manual seeding option is mentioned as well as the requirement for using backup, copy, and restore strategy.
if you go through all steps in the above guides you'll end up with a database in a stable NOT SYNCHRONIZING state on secondary replica.
With MANUAL seeding mode, On the secondary replica, after the secondary database has manually seeded, a final step that joins the database to the AG must take place.
ALTER DATABASE [MyDatabase] SET HADR AVAILABILITY GROUP= [MyDatabase_AG];
This step actually creates the linkage between the AG on the secondary replica and the database it owns. without this step, the synchronization cannot be initialized.
It should be executed after the last log backup has restored on the secondary replica and the database has joined to the availability group on the primary replica.
Therefore, either execute it right after restoring the last log backup was taken or temporarily stop your LOG backup job during this time.
Otherwise, you might find yourself chasing the (log) tail.
BTW, Microsoft does document this step, but in a separate guide from the general AG configuration guide. Which I’ve encountered only after…