In my earlier post I had talked about implementing or rather enabling Stretch Databases in SQL Server 2016 using both the UI and T-SQL and how the Azure SQL Databases are utilized as the remote data storage. In this post, we would talk about implementation details of enabling stretch options on a table and the background activities which goes on in SQL Server and the Azure SQL Database as an effect of enabling the stretch option.
Enabling stretch option for a table is very simple, and can be done by right clicking on the table and selecting the stretch option.
The T-SQL Syntax is
1: Create Table Test_RemoteArchive
2: (
3: a int primary key,
4: b varchar(10),
5: c datetime,
6: d float,
7: e char(2),
8: )
9:
10: Alter table Test_RemoteArchive ENABLE REMOTE_DATA_ARCHIVE WITH ( MIGRATION_STATE = ON )
11: Alter table Test_RemoteArchive ENABLE REMOTE_DATA_ARCHIVE WITH ( MIGRATION_STATE = OFF )
The Migration_State option indicates whether data in the table can be migrated to the remote storage or not. If the option is set to OFF, then no fresh data would be sent to the remote table (already migrated data would continue to exist on the remote table).
A few things to keep in mind when stretching a table.
- If the base table has primary key, the constraint would not be enforced on the remote table.
- A new column (batchID_<objectid>) is added to the table, where ObjectID is the object id of the current table which was enabled for stretch. I would talk about the batchID column later in details.
- A non-unique, non-clustered index is created on this new column.
- A new SP (sp_StretchMigration_<objectid>) where ObjectID is the object id of the current table which was enabled for stretch is created for each table enabled for stretch.
- A trigger (trigger_RemoteDataArchive__<objectid>) is created on the table. The text for this trigger is encrypted.
- If a Row has been identified for migration (which is all the rows as off now), the row cannot be Updated/Deleted from the current table. Though it can be done from the remote table on the Azure SQL Database.
- All DDL operations, except for Alter Table.. Enable Remote_Data_Archive on the tables are prohibited.
- There is no option to disable stretch, the only possible way is to create a new table and then perform data migration to the new table.
3 new columns have been added to the sys.tables catalog view, which indicates whether the table is stretched or not, and if its stretched what is the migration state for the table. Further details can be found here.
- is_remote_data_archive_enabled
- remote_data_archive_migration_state
- remote_data_archive_migration_state_desc
When a row is inserted into the table, the row gets inserted into the local table, which after some time, is archived to the Remote table. When a select operation is run on the table, SQL Server queries both the remote and local storage to get the rows which meet the predicate condition.
As can be seen from the Plan, SQL is using a concatenation of the results from the local storage and remote storage. The actual query which get’s executed on the remote storage looks something like
1: EXECUTE sp_prepexec, Int <noname>='0', NText <noname>='@P1 bigint',
2: NText <noname>='SELECT
3: "Tbl1004"."a" "Col1014",
4: "Tbl1004"."b" "Col1015",
5: "Tbl1004"."c" "Col1016",
6: "Tbl1004"."d" "Col1017",
7: "Tbl1004"."e" "Col1018"
8: FROM "RDAStretchDB_Test9A52425E-7FC9-45A7-9731-FDE7AFAC8512"."dbo"."dbo_Test_RemoteArchive_1509580416_BF917E35-F421-490F-AC92-8F096286696E" "Tbl1004"
9: WHERE "Tbl1004"."a"%(2)=(0) AND "Tbl1004"."batchID--1509580416"<=@P1',
10: BigInt <noname>='4'
Notice the additional predicate for batchId, which is there to ensure that in-transit records are not returned. The value of 4, is the max batchid value in my azure table. The plan for the query on the Remote server is included below.
Note: Given the fact that the Query is getting data from the remote storage, which is on Azure, the performance of the query would be greatly impacted by the network and the amount of data being retrieved from the remote server. But since the objective of Stretch Database is to only remote archive less frequently used data, this performance impact may be very small when compared to the overall cost benefits.
SQL Server uses a max batch size of 10000 Rows, while archiving data on the remote storage. For every batch which was sent (via bulk insert) to the remote storage, a batch Id is associated with the records. For example, if a table being stretched has 1 million records, SQL Server would use 100 batches (each consisting of 10000 rows). On the remote table, the first 10000 rows would have a batchid column value of 1, the next 10000 rows a value of 2 and so on. I enabled stretch on the FactInternetSales table (AdventureWorks2014 Sample DB) and the partial results for the stretch are.
1: batchID--1237579447 CountPerBatch
2: 1 10000
3: 2 10000
4: 3 10000
5: 4 10000
6: 5 10000
7: 6 10000
The migration status for any stretch enabled table can be monitored using the DMV “sys.dm_db_rda_migration_status”. The start_time and the end_time in the table are recorded in UTC. Depending on your network traffic the each batch can take some time for migration.
1: Select
2: table_id,
3: SwitchOffset(cast(start_time_utc as datetimeoffset), '+05:30') as LocalStartTime,
4: SwitchOffset(cast(end_time_utc as datetimeoffset), '+05:30') as LocalEndTime,
5: DateDiff(ss, start_time_utc, end_time_utc) As Duration,
6: migrated_rows,
7: Error_number,
8: Error_state,
9: error_severity
10: from sys.dm_db_rda_migration_status
11: where
12: table_id = object_id('FactInternetSales')
13: Order by
14: SwitchOffset(cast(start_time_utc as datetimeoffset), '+05:30') desc
On my test environment, it takes anywhere between 20-25 seconds for one batch to be migrated to the remote table.
1: table_id LocalStartTime LocalEndTime Duration migrated_rows
2: 1237579447 2015-06-24 10:15:46.1530000 +05:30 2015-06-24 10:16:09.2770000 +05:30 23 10000
3: 1237579447 2015-06-24 10:15:21.6870000 +05:30 2015-06-24 10:15:46.1530000 +05:30 25 10000
4: 1237579447 2015-06-24 10:14:58.2370000 +05:30 2015-06-24 10:15:21.6870000 +05:30 23 10000
5: 1237579447 2015-06-24 10:14:34.4700000 +05:30 2015-06-24 10:14:58.2370000 +05:30 24 10000
6: 1237579447 2015-06-24 10:14:10.7830000 +05:30 2015-06-24 10:14:34.4700000 +05:30 24 10000
7: 1237579447 2015-06-24 10:13:47.6070000 +05:30 2015-06-24 10:14:10.7830000 +05:30 23 10000
8: 1237579447 2015-06-24 10:13:23.8870000 +05:30 2015-06-24 10:13:47.6070000 +05:30 24 10000
9: 1237579447 2015-06-24 10:13:00.2000000 +05:30 2015-06-24 10:13:23.8870000 +05:30 23 10000
10: 1237579447 2015-06-24 10:12:36.8100000 +05:30 2015-06-24 10:13:00.2000000 +05:30 24 10000
The familiar “sp_spaceused” SP, has a new parameter @mode, which indicates if the information is to be returned for the local table, the remote table or both. With the default being ‘ALL’, which returns information about both local and remote storage.
1: Sp_spaceused 'FactInternetSales', 'True', 'ALL'
2: GO
3: Sp_spaceused 'FactInternetSales', 'True', 'Local_Only'
4: GO
5: Sp_spaceused 'FactInternetSales', 'True', 'Remote_Only'
6: GO
Given that it’s just the first release of the feature, there are a few limitations and restrictions with it. A comprehensive list can be found on this MSDN page.
In my next post, I will talk about a few considerations and things to keep in mind when using stretch databases.