Query Timeout of the Distribution Agent
Last week I got a call from one of our clients. They issued an ALTER TABLE command in order to change the data type of one of the columns from TINYINT to INT. This was a 17GB table, and it was part of a publication in a SQL Server transactional replication. The command completed successfully on the publisher, but it failed on the subscriber. The distribution agent tried to execute the command on the subscriber, but failed after about 30 minutes, then tried again, failed again, and so on.
The reason for failing was a query timeout, as can be seen in the distribution agent history log:
Since the table was quite large, the ALTER TABLE command took a long time to complete, and before it managed to complete, a query timeout was expired. But where does this query timeout definition come from? And can we modify it in order to accommodate to situations such as this one?
Replication Agent Profiles
Each replication agent in SQL Server, such as the log reader agent and the distribution agent, is associated with a profile. An agent profile contains a set of parameters with their associated values. These parameters affect the behavior of the corresponding agent. For example, one of the parameters associated with the log reader agent is “PollingInterval”, which defines how often (in seconds) the log reader agent scans the transaction log for new commands to replicate. The default value of this parameter is 5.
Each type of replication agent can have multiple agent profiles, each with a different set of parameter values. At each point in time there is only one active profile. So instead of modifying specific parameter values in a single agent profile, it is more convenient (and recommended) to create multiple profiles for different scenarios, and then simply switch between them. In fact, each replication agent already has multiple predefined agent profiles. You can use any of the predefined profiles or create your own user-defined profiles. Here is an example of the distribution agent profiles on my server. You can see that there are 5 predefined (system) profiles and one user-defined profile (“My User Defined Profile”):
And here is how my user-defined profile looks like:
In order to access the agent profiles in SQL Server Management Studio, launch Replication Monitor, then right-click the relevant agent and choose “Agent Profile”.
You can also manage replication agent profiles by using the following system stored procedures:
The QueryTimeout Parameter
As you can see in the screenshot above, one of the parameters in the distribution agent profile is the “QueryTimeout”. This parameter affects the amount of time (in seconds) before a command, which is executed by the distribution agent on the subscriber, times out. The default is 1,800 seconds, which is 30 minutes. If a command runs for more than 30 minutes on the subscriber (such as our ALTER TABLE statement), then after 30 minutes it receives a timeout and rolls back. Then the distribution agent picks it up again and tries again, and so on.
So we created a new agent profile with a query timeout value of 14,400 (4 hours). The ALTER TABLE command then completed successfully after around 90 minutes.
The QueryTimeout parameter also exists in the profiles of the other replication agents, so it’s worth checking the current value of each one and adjusting it where necessary.
So if you’re working with SQL Server Replication and you’re not familiar with the agent profiles, go and check them out. Get yourself familiar with the various parameters and their default values. You will probably want to create your own user-defined profiles and adjust some values to match your needs.