Keep Your MSDB Clean - Part 2

Eitan Blumin
Mar 23, 2021
6 min read

Updated: Jan 18, 2023

As part of its regular, ongoing, day-to-day activities, your SQL Server instance would naturally collect historical data about its automated operations. If left unchecked, this historical data could pile up, leading to wasted storage space, performance hits, and even worse issues.

MSDB would obviously be collecting data about the SQL Agent job executions. But there are also a few other types of historical data that needs to be cleaned up once in a while.

In part 1, we covered the following:

Output .txt Files
CommandLog
Backup History
Job History
Maintenance Plan History
SQL Server and SQL Server Agent Error Logs
DB Mail History
Full-Text Crawl Logs

It turns out that SQL Server has a bunch more stuff that it retains beyond what we covered in the previous blog post. So, now it's time for part 2!

On this page:

Log Shipping History
MultiServer Administration (MSX) Download List
Database Mirroring Monitor History
Replication Monitor History
Change Data Capture History
Policy-Based Management Execution History
The suspect_pages Table
The dbo.sysssislog Table

Log Shipping History

When monitoring Transaction Log Shipping, SQL Server would store its history in the following tables:

This data is cleaned up using the stored procedure msdb.dbo.sp_cleanup_log_shipping_history. This stored procedure cleans up local and remote copies of the aforementioned tables based on the configured history retention period.

Its default is 7 days, but the retention period can be changed using the following stored procedures per each primary or secondary database:

Every time the Transaction Log Shipping engine does something (specifically, run the LSBackup or LSRestore jobs), it automatically cleans up the history as well. But it's also possible to clean it up manually or via a scheduled job.

However, doing that is a bit complicated because you gotta find the Agent IDs and Agent Types to send to the procedure as parameters.

No worries, though; we got you covered! You can use this T-SQL script available at our own Madeira Toolbox to take care of this!

MultiServer Administration (MSX) Download List

MultiServer Administration, also known as MSX/TSX, can be used for automatic "distribution" or "synchronization" of SQL Agent jobs from one "master" server to one or more "target" servers.

It's one of those SQL Server features that most DBAs heard about once when it was first announced and almost immediately proceeded to forget about it entirely.

And yet, some other DBAs decided to use it anyway, for whatever obscure reason (*cough* they were forced into it by some talking heads in their organization *cough*).

Anyways, this isn't the place to go too deep into this feature, but feel free to go ahead and read about it if you feel like it.

MSX/TSX works with a table called msdb.dbo.sysdownloadlist, which is used for synchronizing the SQL Agent jobs to target servers.

It also retains data about already-synchronized jobs, which doesn't serve any purpose other than for historical auditing and therefore can be deleted.

I tried to look into when/how this table is automatically cleaned up but couldn't find any documentation about it. Obviously, its contents are automatically deleted when defecting TSX servers, deleting jobs, or turning off MSX/TSX entirely. But beyond that, it doesn't look like historical data is deleted automatically.

Therefore, when you use MSX/TSX and make frequent SQL Agent job changes, you should use something like the following command to delete old data from this table on your master server (MSX):

DELETE FROM msdb.dbo.sysdownloadlist
WHERE [status] = 0 -- Successfully downloaded
AND date_downloaded < DATEADD(dd, -90, GETDATE())

Database Mirroring Monitor History

After setting up Database Mirroring, it's also possible to monitor its status using the Database Mirroring Monitor, or the stored procedure sp_dbmmonitorresults.

This historical data is stored in the undocumented table msdb.dbo.dbm_monitor_data. Unfortunately, I couldn't find any documentation or details about what cleans up old data from this table, if at all.

If this table is too big for your taste, you can, of course, clean it up on your own using something like the following command:

DELETE FROM msdb.dbo.dbm_monitor_data
WHERE [time] < DATEADD(dd, -90, GETDATE())

Replication Monitor History

Understandably, when using Replication in SQL Server, there's a whole lot of historical data that needs to be retained. Luckily, several SQL Agent jobs are created for you when you set up Replication. One of which is a job called "Agent history clean up: distribution."

This job executes the dbo.sp_MShistory_cleanup stored procedure on the distributor database. By default, the job retains 48 hours of data (which, honestly, doesn't sound that much at all).

In most cases, this should be good enough. But if you find that you're having issues, you can, of course, modify this job and change the parameter value for the retention period.

EXEC dbo.sp_MShistory_cleanup @history_retention = 48

Change Data Capture History

Change Data Capture (CDC) is built on top of the same mechanism as Replication in SQL Server. So unsurprisingly, it too creates a job responsible for cleaning up old data.

The cleanup mechanism consists of a SQL Server Agent Transact-SQL job that is created when the first database table is enabled. A single cleanup job handles cleanup for all database change tables and applies the same retention value to all defined capture instances.

The cleanup job is initiated by running the parameterless stored procedure sp_MScdc_cleanup_job. This stored procedure starts by extracting the configured retention and threshold values for the cleanup job from msdb.dbo.cdc_jobs.

(source)

You can change the data retention setting by executing the stored procedure sys.sp_cdc_change_job with the parameter @retention, specifying the number of minutes that change rows are to be retained in the change tables.

For example:

USE AdventureWorks2012;  
GO
EXECUTE sys.sp_cdc_change_job   
    @job_type = N'cleanup',  
    @retention = 2880;

Policy-Based Management Execution History

Policy-Based Management is a system for managing one or more instances of SQL Server to create policies that apply various conditions to database target objects. These can be used to force specific server-level or database-level configurations and all other kinds of stuff.

If you connected to a SQL Server instance that was freshly installed any time within the past decade or so, you can probably notice a nightly job called "syspolicy_purge_history," which is just... Always there.

It executes the stored procedure sp_syspolicy_purge_history, which deletes the policy evaluation history according to the history retention interval setting.

So, it looks like you're already set, right?

Probably. But if you want to be sure, you can also use the stored procedure sp_syspolicy_set_config_history_retention to specify the number of days to keep policy evaluation history for Policy-Based Management.

The suspect_pages Table

In the MSDB database, there's a table called "suspect_pages." Here's what Microsoft Docs has to say about this table:

[The suspect_pages table] contains one row per page that failed with a minor 823 error or an 824 error. Pages are listed in this table because they are suspected of being bad, but they might actually be fine. When a suspect page is repaired, its status is updated in the event_type column [in this table].

The following table, which has a limit of 1,000 rows, is stored in the msdb database.

(source)

That bit about the table having "a limit of 1,000 rows" is, well, it's kinda important... Like, a lot important. As in, so important that if you don't manually delete old rows from this table, then no more data would be collected in it for new corruption events.

Yeah. I know. Crazy, right?

But it's all right here in the official Microsoft documentation:

Database administrators are responsible for managing the table, primarily by deleting old rows. The suspect_pages table is limited in size, and if it fills, new errors are not logged. [...] Therefore, we recommend that you periodically delete or archive rows that have an event_type of restored or repaired, or rows that have an old last_update value.

Microsoft was kind enough to let us know of this important information by hiding, I mean, writing it in the documentation, but they didn't provide any automatable stored procedure or maintenance task that we can use for doing this.

Even worse, it has been reported in several use cases that records in the suspect_pages table are not always updated with the restored/repaired event type, even though DBCC CHECKDB returns no errors.

For these reasons, we provide this script available in our Madeira Toolbox, which you can use in a scheduled job. It will move old data to a separate "archive" table and keep the suspect_pages table as lean as possible.

Going too much into depth about corruption is way beyond the scope of this blog post. But you can check out this helpful post by Guy Glantser, or this post by Brent Ozar to learn more about handling corruption, and this cool post by John Martin on how to monitor the suspect_pages table.

The dbo.sysssislog Table

When using the SQL Database Logging mechanism in SSIS, you would select a connection manager to a database, where a table will automatically be created, called dbo.sysssislog. That table will serve as your SSIS execution log.

But unfortunately, there's no automated mechanism that would purge old data from this table.

As a side note, you should be using the built-in logging mechanisms in the newer SSISDB Catalog, which also has built-in automated cleanup mechanics. But that's a discussion for another topic.

In addition to everything else discussed above, you should also have a scheduled job that would clean up from this sysssislog table. Something like the script below should work well to delete data from this table in batches:

SET NOCOUNT ON;
WHILE 1=1
BEGIN
	DELETE TOP (10000) T
	FROM [dbo].[sysssislog] AS T WITH(READPAST)
	WHERE starttime < DATEADD(month, -3, GETDATE())

	IF @@ROWCOUNT = 0 BREAK;

	WAITFOR DELAY '00:00:00.5'
END

In our Madeira Toolbox, you can find a script that iterates through all databases that contain sysssislog and cleans it up.

Conclusion

I'll be honest with you. At this point, I wouldn't be surprised if I missed even more stuff. So, please feel free to write down in the comments about anything else that you think I missed, and who knows? Maybe I'll publish a part 3! We all want to keep our SQL Server instances clean and tidy and avoid inflated MSDB databases, right?