How to Fix a Suspect or Recovery Pending Database in SQL Server?
Updated: Aug 12
One of the scariest moments for a DBA is to see a database in the "Suspect" mode or in the "Recovery Pending" mode...
First, let's understand the difference between these two database modes...
The "Recovery Pending" mode was added in SQL Server 2008, so before, there was only the "Suspect" mode. Generally speaking, when a database was "Suspect" in SQL Server 2005 or earlier version, it meant that the database couldn't be brought online, and thus, it wasn't available for user connections.
Starting with SQL Server 2008, we have two modes - "Suspect" and "Recovery Pending". The "Recovery Pending" mode means that the database can't start the crash recovery process. This is either because SQL Server cannot access the boot page in the primary data file in order to determine whether a crash recovery is needed, or it determined that crash recovery is needed, but it couldn't start the recovery process (for example, because the transaction log file is missing or corrupted).
The "Suspect" mode means that the database started the recovery process, but it failed to complete. For example, this can happen if the transaction log file becomes inaccessible during a transaction rollback operation.
Although it's important to understand the difference between the two modes, usually, the steps to fix the problem are the same. In order to keep it simple, I will list here the actions you should take regardless of the database mode ("Recovery Pending" or "Suspect").
The "Recovery Pending" mode means that the database can't start the crash recovery process
So what do you do when (sh)it happens?
1. Don't panic! I know it's easier said than done, but believe me - this is one of the most important tips I can give you. Because you might do something that will only make things worse, for example, when a database is in the "Suspect" mode, you still might be able to access the database and extract data from it. But if you try to restart SQL Server (because you're under pressure), then you might not be able to access the database anymore. So before you do anything - breathe, drink a glass of water, and then start planning your actions. OK? Are you breathing? Good. Then you can now continue to step #2...
2. Take a backup! Yes, I know you can't really perform a SQL Server backup at this point. But you should try to take some sort of backup of the database files as soon as you can. Sometimes, depending on the state of the database and the database files, SQL Server might not be holding any locks on the files. In this case, copy all the database files to a different location, preferably a separate storage. If this is not possible, then take a storage snapshot or use whatever 3rd-party tool you like, which can take a copy of the database files. Very important - make sure that the tool you are using only tries to read the files and doesn't try to change them in any way
3. Restore from backups. Hopefully, you should have backups available, which correspond to the business RPO and RTO. What's that? You don't know what RPO and RTO mean? Oops... So RPO stands for Recovery Point Objective, and it corresponds to the amount of data the business is willing to lose in case of a failure (like you are facing now). RTO stands for the amount of time it would take to bring the system back to life. In other words, the amount of downtime. If you have a backup and recovery strategy that supports the business RPO and RTO, then you're all good. There's no point in wasting time trying to recover data from the corrupted files. Just execute your recovery policy, and restore from backups.
4. Or you can try to recover data without backups. This is risky, and it can also be time-consuming. Why would you want to do that? Several reasons... First, you might not have backups at all (oh boy). Or maybe you have backups, but you haven't tested them for a while, and they turn out to be corrupted or otherwise unusable (oh boy). Maybe you have valid backups, but it would take 5 hours to restore everything from scratch. Even if it's acceptable according to the business RTO, you might want to try to recover the database using other methods, hoping you can do it in less than 5 hours and with an acceptable amount of data loss. You should be careful because this can be time-consuming, and if you give up after 2 hours and only then start the 5-hours restore process, then, well, it's 7 hours of downtime. What you should do is start the restore from backups ASAP to a new database with a new name on the same instance. Then, while the restore is running, you can start playing with the corrupted database. If you manage to bring it online in less than 5 hours, then you can drop the restored database. If not, then at least you have a fresh new database after 5 hours. Don't forget to rename the new database as well as its database files to match the original database.
Picture credit: Lunde Studio
In the next sections, I will describe the steps to (try to) recover data from the corrupted database without backups. I hope you're ready for that...
5. Put the database in emergency mode. This is the first thing you should try. Because otherwise, you can't access or use the database.
If you managed to put the database in emergency mode, then maybe things are not as bad as you thought (but just maybe). You can continue to the next step. If you get an error, then good luck (you'll need it). Go to step #9.
6. Run DBCC CHECKDB. It might take a while, and it will return all the corrupted pages in the database. I prefer to run it with NO_INFOMSGS, so that it only outputs errors, like this: