SQLUNINTERRUPTED

I am just a medium, SQL Server the Goal

Introduction to Stretch Databases – Part Three

Continuing with my earlier posts on Stretch Databases – Part1, Part2, I will discuss some key aspects to keep in mind when we stretch a database.

Note- SQL Server 2016 is still in preview and some of this information can change in the future.

Insert, Update and Delete

As I had mentioned in my earlier posts, once the records have been moved to the remote Azure Databases, it cannot be Deleted/Updated from the local server. All updates/deletes would need to be run on the Remote Server explicitly. The following error would be returned if an attempt is made to update/insert the record, which is already on the remote server.

Update and delete of rows eligible for migration in table ‘Test_RemoteArchive’ is not allowed because of the use of REMOTE_DATA_ARCHIVE.
Msg 3609, Level 16, State 1, Line 3
The transaction ended in the trigger. The batch has been aborted.

New data which is inserted into the table, get inserted to the local table first and then get archived to the remote server at a later point in time.

Select Operations

As previously mentioned, Select operations executes the query on both the local and remote server and then concatenates the data before sending to the client. A typical query plan for queries on a Stretched tabled would look something like.

If SQL Server is not able to connect to the remote linked server (either because of network related issues or because of authentication issues) the following error would be returned.

OLE DB provider “SQLNCLI11″ for linked server “ServerName” returned message “Login timeout expired”.
OLE DB provider “SQLNCLI11″ for linked server “ServerName” returned message “A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.”.

Msg 53, Level 16, State 1, Line 3
Named Pipes Provider: Could not open a connection to SQL Server [53].

This error would also be logged in the SQL Error Log. If the connection to the Remote Server gets terminated in the middle of query execution, you would get the following error..

OLE DB provider “SQLNCLI11″ for linked server “ServerName” returned message “Communication link failure”.
Msg 10054, Level 16, State 1, Line 3
TCP Provider: An existing connection was forcibly closed by the remote host.

Backup and Restore

Databases enabled for stretch can be backed up or restored just the ordinary databases. The only difference being that immediately after restore you wont be able to query the remote table data. In order to query the remote data, we first have to reconnect the database to the remote database. This can be done using the stored procedure sys.sp_reauthorize_remote_data_archive.

Exec sp_reauthorize_remote_data_archive N’sql_azure_sysadmin_username’ , N’password’

Zipping and Unzipping Files – The Fast and Furious Way

Recently while working on a customer project, we were required to zip and unzip hundreds of files of varying size. The objective was to import IIS Logs from over 100+ servers to Microsoft Analytics platform system for further analytics. The SLA’s we were working on was 1 hour, i.e. APS should have data available with a maximum latency of 1 hour. The processing required us to copy Logs from these IIS servers to a central NAS and from there to the APS landing zone. The hop to the central NAS was required because of some regulatory requirements at the client.

The IIS logs which were getting created every 1 hour, were anywhere between 1 MB – 50 MB, depending on the activity on the servers. Coping the raw files was out of question for obvious reasons. So we had to zip the files (thankfully IIS logs give tremendous compression) and copy them to the central NAS. From there copy and unzip the files on the Landing Zone

We tired multiple options like, calling exe’s (rar, unrar, 7-zip) from PowerShell, but that did not scale well. Eventually we settled on the .NET Compression routine (.NET 4 and above) to this.  We saw overwhelming results when using .NET routines (~13-15 times better execution speed) during compression and decompression.

For the purpose of this blog, I created a test folder, and filled it with 204 different types of files (images from our last trip, some PowerPoint presentations etc.) of varying size (max size 7 MB, min size 5 KB), for a total size of 320 MB.

Compression Results using rar/7-zip

$files = Get-ChildItem -Recurse "C:\Intel\BlogTests\TestFolder"
$starttime = Get-Date
foreach ($file in $files)
{
    $ZipUtility = 'C:\Program Files\WinRAR\rar.exe'
    $fileToCompress = $file.FullName
    $outputFile =  "C:\Intel\BlogTests\ZipFiles\" +$file.Name.Replace(".","_") + ".rar" 
    $zipCommandparam = " a " + $outputFile + " " + $fileToCompress
    $ps = Start-Process -FilePath $ZipUtility -ArgumentList $zipCommandparam -NoNewWindow -PassThru -Wait
    $ps.WaitForExit() # block till exe finish
}
$endTime = Get-Date
Write-Host "Time to Complete Using RAR -: "$endTime.Subtract($starttime).ToString()

Results With Rar.exe  —> Total Duration – 00:03:28 (3 minutes, 28 Seconds)image
Results With 7-zip.exe —> Total Duration – 00:03:48 (3 minutes 48 Seconds)

Compression Results using .Net Compression Routine


[System.Reflection.Assembly]::LoadWithPartialName('System.IO.Compression.FileSystem')
$files = Get-ChildItem -Recurse "C:\Intel\BlogTests\TestFolder"
$starttime = Get-Date
foreach ($file in $files)
{
$fileToCompress = $file.FullName
$directorypath = "C:\Intel\BlogTests\ZipFiles\" + $file.Name.Substring(0,$file.Name.LastIndexOf("."))
$outputzipfile =  "C:\Intel\BlogTests\ZipFiles\" + $file.Name.Replace(".","_") + ".zip"

#Write-Host $directory
$directory = [IO.Directory]::CreateDirectory($directorypath)
$filetocopy = $directorypath + "\" + $file.Name
[IO.File]::Copy($file.FullName,$filetocopy)
[System.IO.Compression.ZipFile]::CreateFromDirectory($directorypath,$outputzipfile)
[IO.Directory]::Delete($directorypath,$true)
}
$endTime = Get-Date
Write-Host "Time to Complete Using .Net Routine -: "$endTime.Subtract($starttime).ToString()

Results With .Net Routine —> Total Duration – 00:00:17 (17 Seconds)
image_thumb.png

We achieved similar results while decompressing the files.

Business Continuity and Disaster Recovery with Azure SQL Databases

During one of the recent Azure Conference, I had talked about the Business Continuity and Disaster Recovery options available with Windows Azure SQL Databases. The presentation I used is included below.

Azure SQL Databases provides the ability to restore a database to any point in time, within the retention interval for that service tier or as configured. Additionally, in case of site failures backups can be restored from Geo redundant copies. Both the Point-In-Time and the Geo restore features are available in all the service tiers for Azure SQL Databases. The retention interval is different for different service tiers. The total storage for backups is dependent on the size of the DB provisioned (twice the size of the DB provisioned). Additional storage can be purchased if required or the retention can be reduced by contacting the Azure Support teams.

Azure SQL Databases also provides a pre-configured, transparent Local (within the same datacenter) and optional Geo (Standard Geo and Active Geo Replications) high availability option for business continuity. The Standard (available in Standard and Premium tiers) and Active (Only Premium Tiers) Geo Replication needs to be configured and will incur additionally costs for the high availability copy. In case of standard Geo replication the secondary is a non-readable secondary. While Active geo allows creation of up to 4 readable secondary’s.

Demo Script

</pre>

# Pre-Work -- Create 3 databases - 1 Basic, 1 StandardS1 and 1 PrimiumP1 using portal
# **********************************************************************************#

#Add the Azure Account
Add-AzureAccount

# List All the DB's already under the Account
$DbServer = Get-AzureSqlDatabaseServer -ServerName "Azure SQL Database Server"
$DBName = Get-AzureSqlDatabase -ServerName $DbServer.ServerName
$DBName | Format-Table

# Demo 1 -- Changing the Servier Tier and Performance Levels for a DB.
# **********************************************************************************#

# 1. Create a Sample DB in the basic tier  (Database Name - TestBasic)
New-AzureSqlDatabase -ServerName $DbServer.ServerName -DatabaseName "TestBasic" -Edition Basic -Verbose -Confirm
#Check if the DB Got created or not..
$test = Get-AzureSqlDatabase -ServerName $DbServer.ServerName -DatabaseName "TestBasic"
If($test -ne $null)
{
Write-Host "Database Creation Successful"
}

# 2. Use the below PowerShell to change the service Tier to Premium P1
$P1= Get-AzureSqlDatabaseServiceObjective -ServerName $DbServer.ServerName -ServiceObjectiveName "P1"
Set-AzureSqlDatabase -ServerName $DbServer.ServerName -DatabaseName "TestBasic" -Edition Premium -ServiceObjective $P1 -Verbose

# 3. From the Output below, we can see that the scale operation is being done in the backend.
$test = Get-AzureSqlDatabase -ServerName $DbServer.ServerName -DatabaseName "TestBasic"
$test | Format-Table

# 4. From the Output below, we can see that the scale operation is being done in the backend.
Remove-AzureSqlDatabase -ServerName $DbServer.ServerName -DatabaseName "TestBasic"

$DBName = Get-AzureSqlDatabase -ServerName $DbServer.ServerName
$DBName | Format-Table

# Demo 2 -- Point-in-Time Restore
# **********************************************************************************#

#1. Get the last restorable time for all the Databases in the account.
$DBName = Get-AzureSqlDatabase -ServerName $DbServer.ServerName
Foreach ($db in $DBName)
{
if($db.name -ne "master")
{ Write-Host $db.Name  "-->" $db.RecoveryPeriodStartDate.ToUniversalTime() }
}

## PointInTime Paramter takes values in GMT and not local server time.
$RestoreRequest = Start-AzureSqlDatabaseRestore -SourceDatabase $DBName[0] –TargetDatabaseName “BasicDB_New” –PointInTime “2015-05-07 04:00:00”
Get-AzureSqlDatabaseOperation –ServerName $DbServer.ServerName –OperationGuid $RestoreRequest.RequestID

## Restore a Deleted Database
Get-AzureSqlDatabase -RestorableDropped -ServerName $DbServer.ServerName
$RestoreRequest = Start-AzureSqlDatabaseRestore -SourceServerName $DbServer.ServerName –SourceDatabaseName “BasicDB” -SourceDatabaseDeletionDate "5/4/2015 4:54:02 PM" –TargetDatabaseName “BasicDB_Restored” -RestorableDropped
Get-AzureSqlDatabaseOperation –ServerName $DbServer.ServerName –OperationGuid $RestoreRequest.RequestID

# Demo 3 -- Geo Restore - Using PowerShell
# **********************************************************************************#

# Get a list of all recoverable Databases, which can be Geo-Restored
Get-AzureSqlRecoverableDatabase -ServerName $DbServer.ServerName

# Perform Geo-Restore
$RecoveryDatabase = Get-AzureSqlRecoverableDatabase -ServerName $DbServer.ServerName -DatabaseName "Basic_ADWorksDW"
$RestoreRequest = Start-AzureSqlDatabaseRecovery -SourceDatabase $RecoveryDatabase -TargetServerName "Azure SQL Database Server" -TargetDatabaseName "Basic_ADWorksDW_GeoRestored"
Get-AzureSqlDatabaseOperation –ServerName "Azure SQL Database Server" –OperationGuid $RestoreRequest.ID

# Demo 4 -- Setting up Geo Replication for Database
# **********************************************************************************#

#1. Standard Geo-Replication (Offline) using Powershell

Start-AzureSqlDatabaseCopy -ServerName $DbServer.ServerName -DatabaseName "StandardS2" -PartnerServer "Azure SQL Database Server" -OfflineSecondary -ContinuousCopy
Get-AzureSqlDatabaseCopy -ServerName $DbServer.ServerName -DatabaseName "StandardS2" -PartnerServer "Azure SQL Database Server"

#2. Active Geo-Replication Using PowerShell
Start-AzureSqlDatabaseCopy -ServerName $DbServer.ServerName -DatabaseName "PremiumP2" -PartnerServer "Azure SQL Database Server" -ContinuousCopy
Get-AzureSqlDatabaseCopy -ServerName $DbServer.ServerName -DatabaseName "PremiumP2" -PartnerServer "Azure SQL Database Server"

# Demo 5 -- Perform a disaster Drill
# **********************************************************************************#

#Initiate a forced termination of the Continuous Copy on the Secondary

$SecondaryDb = Get-AzureSqlDatabaseCopy -ServerName "Azure SQL Database Server" -DatabaseName "PremiumP2"
$SecondaryDb | Stop-AzureSqlDatabaseCopy -ServerName "Azure SQL Database Server" -ForcedTermination
<pre>

 

 

 

Introduction to Stretch Databases – Part Two

In my earlier post I had talked about implementing or rather enabling Stretch Databases in SQL Server 2016 using both the UI and T-SQL and how the Azure SQL Databases are utilized as the remote data storage. In this post, we would talk about implementation details of enabling stretch options on a table and the background activities which goes on in SQL Server and the Azure SQL Database as an effect of enabling the stretch option.

Enabling stretch option for a table is very simple, and can be done by right clicking on the table and selecting the stretch option.

image

The T-SQL Syntax is

   1: Create Table Test_RemoteArchive 

   2: (

   3: a int primary key,

   4: b varchar(10),

   5: c datetime,

   6: d float,

   7: e char(2),

   8: )

   9:  

  10: Alter table Test_RemoteArchive  ENABLE REMOTE_DATA_ARCHIVE WITH ( MIGRATION_STATE = ON )

  11: Alter table Test_RemoteArchive  ENABLE REMOTE_DATA_ARCHIVE WITH ( MIGRATION_STATE = OFF )

The Migration_State option indicates whether data in the table can be migrated to the remote storage or not. If the option is set to OFF, then no fresh data would be sent to the remote table (already migrated data would continue to exist on the remote table).

A few things to keep in mind when stretching a table.

  • If the base table has primary key, the constraint would not be enforced on the remote table.
  • A new column (batchID_<objectid>) is added to the table, where ObjectID is the object id of the current table which was enabled for stretch. I would talk about the batchID column later in details.
  • A non-unique, non-clustered index is created on this new column.
  • A new SP (sp_StretchMigration_<objectid>) where ObjectID is the object id of the current table which was enabled for stretch is created for each table enabled for stretch.
  • A trigger (trigger_RemoteDataArchive__<objectid>) is created on the table. The text for this trigger is encrypted.
  • If a Row has been identified for migration (which is all the rows as off now), the row cannot be Updated/Deleted from the current table. Though it can be done from the remote table on the Azure SQL Database.
  • All DDL operations, except for Alter Table.. Enable Remote_Data_Archive  on the tables are prohibited.
  • There is no option to disable stretch, the only possible way is to create a new table and then perform data migration to the new table.

3 new columns have been added to the sys.tables catalog view, which indicates whether the table is stretched or not, and if its stretched what is the migration state for the table. Further details can be found here.

  • is_remote_data_archive_enabled
  • remote_data_archive_migration_state
  • remote_data_archive_migration_state_desc

When a row is inserted into the table, the row gets inserted into the local table, which after some time, is archived to the Remote table. When a select operation is run on the table, SQL Server queries both the remote and local storage to get the rows which meet the predicate condition.

image

As can be seen from the Plan, SQL is using a concatenation of the results from the local storage and remote storage. The actual query which get’s executed on the remote storage looks something like

   1: EXECUTE sp_prepexec, Int <noname>='0', NText <noname>='@P1 bigint', 

   2: NText <noname>='SELECT 

   3:                     "Tbl1004"."a" "Col1014",

   4:                     "Tbl1004"."b" "Col1015",

   5:                     "Tbl1004"."c" "Col1016",

   6:                     "Tbl1004"."d" "Col1017",

   7:                     "Tbl1004"."e" "Col1018" 

   8:                     FROM "RDAStretchDB_Test9A52425E-7FC9-45A7-9731-FDE7AFAC8512"."dbo"."dbo_Test_RemoteArchive_1509580416_BF917E35-F421-490F-AC92-8F096286696E" "Tbl1004"

   9:                      WHERE "Tbl1004"."a"%(2)=(0) AND "Tbl1004"."batchID--1509580416"<=@P1', 

  10: BigInt <noname>='4'

Notice the additional predicate for batchId, which is there to ensure that in-transit records are not returned. The value of 4, is the max batchid value in my azure table. The plan for the query on the Remote server is included below.

image

Note: Given the fact that the Query is getting data from the remote storage, which is on Azure, the performance of the query would be greatly impacted by the network and the amount of data being retrieved from the remote server. But since the objective of Stretch Database is to only remote archive less frequently used data, this performance impact may be very small when compared to the overall cost benefits.

SQL Server uses a max batch size of 10000 Rows, while archiving data on the remote storage. For every batch which was sent (via bulk insert) to the remote storage, a batch Id is associated with the records. For example, if a table being stretched has 1 million records, SQL Server would use 100 batches (each consisting of 10000 rows). On the remote table, the first 10000 rows would have a batchid column value of 1, the next 10000 rows a value of 2 and so on. I enabled stretch on the FactInternetSales table (AdventureWorks2014 Sample DB) and the partial results for the stretch are.

   1: batchID--1237579447    CountPerBatch

   2: 1                        10000

   3: 2                        10000

   4: 3                        10000

   5: 4                        10000

   6: 5                        10000

   7: 6                        10000

The migration status for any stretch enabled table can be monitored using the DMV “sys.dm_db_rda_migration_status”. The start_time and the end_time in the table are recorded in UTC. Depending on your network traffic the each batch can take some time for migration.

   1: Select 

   2:     table_id, 

   3:     SwitchOffset(cast(start_time_utc as datetimeoffset), '+05:30') as LocalStartTime,  

   4:     SwitchOffset(cast(end_time_utc as datetimeoffset), '+05:30') as LocalEndTime,

   5:     DateDiff(ss, start_time_utc, end_time_utc) As Duration,

   6:     migrated_rows, 

   7:     Error_number, 

   8:     Error_state, 

   9:     error_severity

  10:     from sys.dm_db_rda_migration_status

  11: where 

  12:     table_id = object_id('FactInternetSales')

  13: Order by 

  14:     SwitchOffset(cast(start_time_utc as datetimeoffset), '+05:30') desc

On my test environment, it takes anywhere between 20-25 seconds for one batch to be migrated to the remote table.

   1: table_id      LocalStartTime                        LocalEndTime                        Duration    migrated_rows

   2: 1237579447    2015-06-24 10:15:46.1530000 +05:30    2015-06-24 10:16:09.2770000 +05:30    23        10000

   3: 1237579447    2015-06-24 10:15:21.6870000 +05:30    2015-06-24 10:15:46.1530000 +05:30    25        10000

   4: 1237579447    2015-06-24 10:14:58.2370000 +05:30    2015-06-24 10:15:21.6870000 +05:30    23        10000

   5: 1237579447    2015-06-24 10:14:34.4700000 +05:30    2015-06-24 10:14:58.2370000 +05:30    24        10000

   6: 1237579447    2015-06-24 10:14:10.7830000 +05:30    2015-06-24 10:14:34.4700000 +05:30    24        10000

   7: 1237579447    2015-06-24 10:13:47.6070000 +05:30    2015-06-24 10:14:10.7830000 +05:30    23        10000

   8: 1237579447    2015-06-24 10:13:23.8870000 +05:30    2015-06-24 10:13:47.6070000 +05:30    24        10000

   9: 1237579447    2015-06-24 10:13:00.2000000 +05:30    2015-06-24 10:13:23.8870000 +05:30    23        10000

  10: 1237579447    2015-06-24 10:12:36.8100000 +05:30    2015-06-24 10:13:00.2000000 +05:30    24        10000

The familiar “sp_spaceused” SP, has a new parameter @mode, which indicates if the information is to be returned for the local table, the remote table or both. With the default being ‘ALL’, which returns information about both local and remote storage.

   1: Sp_spaceused 'FactInternetSales', 'True', 'ALL'

   2: GO

   3: Sp_spaceused 'FactInternetSales', 'True', 'Local_Only'

   4: GO

   5: Sp_spaceused 'FactInternetSales', 'True', 'Remote_Only'

   6: GO

image 

Given that it’s just the first release of the feature, there are a few limitations and restrictions with it. A comprehensive list can be found on this MSDN page.

In my next post, I will talk about a few considerations and things to keep in mind when using stretch databases.

Introduction to Stretch Databases with SQL Server 2016

Its been a long time since I have written a blog and what better way to start again than by writing about one of the most interesting features being introduced in SQL Server. In a series of blogs over the next few weeks, I will try and get into the details of how Stretch Databases are implemented and some of the inner workings of the feature to watch out for.

Before, we delve into the feature its important to understand that SQL 2016 is still in the CTP phase and some of the things can change once the product RTM’s. I will blog about the changes if any in another post when it happens.

Ok, so without wasting any more time, Stretch Databases, as the name suggests, stretches your databases/tables to a remote storage, while masking the implementation details from the end user. You continue to access the table in the same way as before , while SQL Server internally traverses the local and the remote storage to get the requested data set. In the current release, the Stretch option moves the entire table to the remote storage, which is a V12 Azure SQL Database (Standard S3 Tier).

Use cases for Stretch Databases

Stretch Databases are useful in scenarios where there is a ton of transactional data, which needs to be stored in your environment for historical querying or maybe government regulations. You only query/work with a small subset of data in these tables (mostly latest transaction data) on a frequent basis. Even with the modern day archiving, manageability and compression techniques (like ColumnStore Indexes, partitioning, using combination of tables/views) the storage cost or the man hours required to update/change your application are prohibitively high, when compared to the Azure Storage utilized by the Azure SQL Database.

Implementing Stretch Databases

Implementing Stretch Databases is really simple. In order to enable a database for Stretch, the ‘Remote Data Archive’ configuration option needs to be set. This can be done using the sp_configure command

   1: sp_configure 'remote data archive', 1

   2: Reconfigure

Next, right click on the DB, on which stretch needs to be enabled, go to task->‘Enable Database for Stretch’. This would launch the wizard for stretch configuration. Sign in with your Azure credentials

image image

The window let’s you choose the subscription in case there are more than one. The next screen allows you to select the Stretch Settings (essentially the location of the Azure SQL Database, the admin login for the WASD server and setting the IP exception rule for your current SQL Server IP).

image

Once you have made sure the settings are current and the click on finish, the following steps are executed on the SQL Server and the Azure SQL Database.

  1. Provision a New SQL Azure Database Server (fixed naming convention)
  2. Configure the firewall exceptions for the WASD server
  3. Create a Credential on the current SQL Environment for the WASD
  4. Create a linked Server to the Azure SQL Database Server
  5. Create the SQL Azure Database.

image

Stretch DB creation creates a log in the “\Users\<current_user>\AppData\Local\SQL Server\Stretch Database to SQL Azure” folder, which can be used for troubleshooting purposes.

Once the option is enabled for the database, the following entries are logged in the SQL Error Log.

   1: 2015-06-20 23:15:44.89 spid51      Setting database option remote_data_archive to ON for database 'StretchDB_Test'.

   2: 2015-06-20 23:15:49.88 spid51      Setting database option remote_data_archive to ON for database 'StretchDB_Test'.

The Wizard does not allow you to choose an existing WASD server, but if creating through T-SQL, you can choose an existing WASD server.

   1: /* Enabling Stretch Database Using T-SQL */

   2:  

   3: /* Step 1 -> Enable Firewall Exceptions on the Azure SQL DB */ 

   4: -- Can be performed through the Azure Portal/Commands on the sp_set_firewall_rule Extended SP on the master DB of the WASD Server

   5:  

   6: /* Step 2 -> Create the Credentials for the WASD Server (linked Server would use these credentials */ 

   7: CREATE CREDENTIAL [tnkll47icl.database.windows.net] 

   8: WITH IDENTITY = '<ServerAdmin>', SECRET = '<Password>'

   9:  

  10: /* Step 3 -> Enable the Stretch DB Option */ 

  11: ALTER DATABASE [StretchDB_Test] SET REMOTE_DATA_ARCHIVE = ON 

  12: (SERVER = N'<SQL Azure Database Server')

 

In the next post, I will talk about the implementation details of stretch the tables within a database and other system SP’s/functions/DMV’s as part of the Stretch DB implementation.

Follow

Get every new post delivered to your Inbox.

Join 162 other followers