SQLUninterrupted

I am just a medium, SQL Server the Goal

Target_Recovery_Time – Indirect Checkpoint SQL Server 2012

Posted by Sourabh Agarwal on March 19, 2012

SQL Server 2012, introduces a new Database level option “Target_Recovery_Time” to better control and predict the time it takes during Recovery for the Database. The reason for this option was simple, the “Recovery Interval” option in sp_configure was really not reliable enough.

Before we talk about the new option, lets talk a bit about the older option.

With the Recovery interval option (set in minutes, default is 0), SQL Server would schedule the Database checkpoints in such a way that the time taken to recover the database is similar to what is specified in the Recovery Interval setting. What happens in the background is that SQL Server tries to schedule the checkpoint based on the number of DML operations that are taking place, but fails to take into consideration the number of DB Pages being modified by these operations.

As shown in the Video, SQL Server first calculates the Checkpoint interval for the Database number of DML operations and the value of “Recovery Interval” option.

When the first transaction completes (which modifies about 160 pages) and the subsequent fires, we flush all these 160 pages to the disk an move the Min_Recovery_LSN forward to the checkpoint LSN. This Min Recovery LSN is the what is used during the Database recovery, to Predict what operations are to be repeated in the Redo Phase of Recovery. During the Second Transaction (which is modifying close to 12K pages), if there were no problems, during checkpoint we would have flushed these 12K pages to the disk, which would result in a spike in the Disk Write Activity. Assuming that there was a crash, just before the Checkpoint happened (the transaction did commit), because of the large amount of pages which needs to be loaded and modified, the time to recover the database might vary and would be unpredictable.

The current Checkpoint algorithm induces the following problems, which are indeed the reason why the new Database was introduced.

  • Massive spikes in the Disk Write activity, during checkpoint.
  • Unpredictable recovery times for the Database.

The idea was to smooth out the disk write activity, so that we no longer see the massive spikes and to also have better predictability of the recovery time of a database.

The new database option can be set using the following Command.

   1: ALTER DATABASE [CheckpointDB] SET TARGET_RECOVERY_TIME = 10 SECONDS 

   2: GO

or it can be set using the Database Properties window in SSMS.

image

When the new database option is set, SQL Server performs two things,

  • First it  introduces a background task to continuously flush the Dirty buffers from the disk. What SQL actually does is it calculates a min number of Dirty buffers it would keep. Every time this threshold is exceeded, the background process (called the background Recovery writer) kicks in and flushes the dirty buffers to the disk.
  • Continously move the Min_Recovery_LSN forwards to the LSN, which was most recently flushed. Since the Min_Recovery LSN is continuously moving forward, in case of a crash happens, the amount of work done during the Recovery would be small.

The video, below illustrates how this is happens.

With the new setting, the writes are no longer in spikes (smoothens out because of the background recovery writer) and the amount of work which needs to be done during the Recovery Phase is also reduced, providing more accurate estimations of the Recovery Time for the Database.

Word of Caution

If you set the Target_Recovery_Time to a very small number (like 1 sec) on an OLTP environment, SQL Server might just throttle the disk with the massive amount of Write operations which needs to be done as part of the Background Recovery Writes. Set this option, only after carefully examining the workload pattern and your SLA commitments. Most often the older “Recovery Interval” option from sp_configure works well.

I would like to thank my friend Parikshit Savjani, for his help with the Videos.

About these ads

2 Responses to “Target_Recovery_Time – Indirect Checkpoint SQL Server 2012”

  1. […] concepts really clear was from Sourabh Agarwal,yet another PFE (B) and his that post can be read here.The major drawbacks of the old algorithm is massive Disk Write activity during checkpoint and […]

  2. […] http://sqluninterrupted.com/2012/03/19/target_recovery_time-indirect-checkpoint-sql-server-2012/ […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 82 other followers

%d bloggers like this: