Target_Recovery_Time – Indirect Checkpoint SQL Server 2012
Posted by Sourabh Agarwal on March 19, 2012
SQL Server 2012, introduces a new Database level option “Target_Recovery_Time” to better control and predict the time it takes during Recovery for the Database. The reason for this option was simple, the “Recovery Interval” option in sp_configure was really not reliable enough.
Before we talk about the new option, lets talk a bit about the older option.
With the Recovery interval option (set in minutes, default is 0), SQL Server would schedule the Database checkpoints in such a way that the time taken to recover the database is similar to what is specified in the Recovery Interval setting. What happens in the background is that SQL Server tries to schedule the checkpoint based on the number of DML operations that are taking place, but fails to take into consideration the number of DB Pages being modified by these operations.
As shown in the Video, SQL Server first calculates the Checkpoint interval for the Database number of DML operations and the value of “Recovery Interval” option.
When the first transaction completes (which modifies about 160 pages) and the subsequent fires, we flush all these 160 pages to the disk an move the Min_Recovery_LSN forward to the checkpoint LSN. This Min Recovery LSN is the what is used during the Database recovery, to Predict what operations are to be repeated in the Redo Phase of Recovery. During the Second Transaction (which is modifying close to 12K pages), if there were no problems, during checkpoint we would have flushed these 12K pages to the disk, which would result in a spike in the Disk Write Activity. Assuming that there was a crash, just before the Checkpoint happened (the transaction did commit), because of the large amount of pages which needs to be loaded and modified, the time to recover the database might vary and would be unpredictable.
The current Checkpoint algorithm induces the following problems, which are indeed the reason why the new Database was introduced.
- Massive spikes in the Disk Write activity, during checkpoint.
- Unpredictable recovery times for the Database.
The idea was to smooth out the disk write activity, so that we no longer see the massive spikes and to also have better predictability of the recovery time of a database.
The new database option can be set using the following Command.
or it can be set using the Database Properties window in SSMS.
When the new database option is set, SQL Server performs two things,
First it introduces a background task to continuously flush the Dirty buffers from the disk. What SQL actually does is it calculates a min number of Dirty buffers it would keep. Every time this threshold is exceeded, the background process (called the background Recovery writer) kicks in and flushes the dirty buffers to the disk.
Continously move the Min_Recovery_LSN forwards to the LSN, which was most recently flushed. Since the Min_Recovery LSN is continuously moving forward, in case of a crash happens, the amount of work done during the Recovery would be small.
The video, below illustrates how this is happens.
With the new setting, the writes are no longer in spikes (smoothens out because of the background recovery writer) and the amount of work which needs to be done during the Recovery Phase is also reduced, providing more accurate estimations of the Recovery Time for the Database.
Word of Caution
If you set the Target_Recovery_Time to a very small number (like 1 sec) on an OLTP environment, SQL Server might just throttle the disk with the massive amount of Write operations which needs to be done as part of the Background Recovery Writes. Set this option, only after carefully examining the workload pattern and your SLA commitments. Most often the older “Recovery Interval” option from sp_configure works well.
I would like to thank my friend Parikshit Savjani, for his help with the Videos.