SQLUNINTERRUPTED

I am just a medium, SQL Server the Goal

Category Archives: FILESTREAM

Replicating a Table with FileStream Data using Merge Replication

With SQL Server 2008, FILESTREAM support was added to SQL Server. FileStream allows creation of large objects as File System objects instead of database objects. With file stream, the data resides on the File System, and can be accessed under the context of a SQL Transaction using T-SQL or Win32 API’s.

FileStream is compatible with other existing SQL Server features like Replication and Mirroring. Recently I was working with a customer where they were trying to implement Merge Replication on a database using File Stream and were running into the below error during synchronization.

The schema script ‘DocumentStore_2.sch’ could not be propagated to the subscriber. (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147201001)
Default FILESTREAM filegroup is not available in database ‘ReplicatedFileStorage’. (Source: MSSQLServer, Error number: 1969)

We tried to create a test scenario for the above to see why that was happening.

On the production database we created a Test Database with FileStream enabled. In the test Database, we created a table with a FileStream column and tried to replicate that database using Merge Replication. Since Merge Replication requires a rowguid column to be present on the table, we also added a new unique identifier column to the table, with a default constraint of NEWSEQUENTIALID(). We did not use NEWID() as NEWSEQUENTIALID() performs better.

Included below is the script for the Database and the tables.

   1: USE [master]

   2: GO

   3:  

   4: /****** Object:  Database [TestFileStream]    Script Date: 03/27/2012 07:00:44 ******/

   5: CREATE DATABASE [TestFileStream] ON  PRIMARY 

   6: ( NAME = N'TestFileStream', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestFileStream.mdf' , SIZE = 4352KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ), 

   7:  FILEGROUP [FILESTREAMGROUP1] CONTAINS FILESTREAM  DEFAULT 

   8: ( NAME = N'FSGroup1File', FILENAME = N'C:\Program Files\Microsoft SQL Server\SQLFILESTREAM\FSData' )

   9:  LOG ON 

  10: ( NAME = N'TestFileStream_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestFileStream_log.LDF' , SIZE = 3840KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)

  11: GO

  12:  

  13: USE [TestFileStream]

  14: GO

  15:  

  16: CREATE TABLE [dbo].[DocumentStore](

  17:     [DocumentID] [int] IDENTITY(1,1) PRIMARY KEY,

  18:     [Document] [varbinary](max) FILESTREAM  NULL,

  19:     [DocGUID] [uniqueidentifier]  NOT NULL ROWGUIDCOL UNIQUE DEFAULT NEWSEQUENTIALID() ,

  20:     [ModifiedDate] [datetime] NULL

  21: ) ON [PRIMARY] FILESTREAM_ON [FILESTREAMGROUP1]

Next we created a Merge Publication for the database. In this case, we did not use the UI to create the publication, as the UI does not provide any specific option for FileStream Replication. We created a script for the Publication.

   1: use [TestFileStream]

   2: exec sp_replicationdboption @dbname = N'TestFileStream', @optname = N'merge publish', @value = N'true'

   3: GO

   4: -- Adding the merge publication

   5: use [TestFileStream]

   6: exec sp_addmergepublication @publication = N'FileStreamPublication', @description = N'Merge publication of database ''TestFileStream'' from Publisher ''SQLSCOM''.', @sync_mode = N'native', @retention = 14, @allow_push = N'true', @allow_pull = N'true', @allow_anonymous = N'true', @enabled_for_internet = N'false', @snapshot_in_defaultfolder = N'true', @compress_snapshot = N'false', @ftp_port = 21, @allow_subscription_copy = N'false', @add_to_active_directory = N'false', @dynamic_filters = N'false', @conflict_retention = 14, @keep_partition_changes = N'false', @allow_synctoalternate = N'false', @max_concurrent_merge = 0, @max_concurrent_dynamic_snapshots = 0, @use_partition_groups = null, @publication_compatibility_level = N'100RTM', @replicate_ddl = 1, @allow_subscriber_initiated_snapshot = N'false', @allow_web_synchronization = N'false', @allow_partition_realignment = N'true', @retention_period_unit = N'days', @conflict_logging = N'both', @automatic_reinitialization_policy = 0

   7: GO

   8:  

   9: exec sp_addpublication_snapshot @publication = N'FileStreamPublication', @frequency_type = 4, @frequency_interval = 14, @frequency_relative_interval = 1, @frequency_recurrence_factor = 0, @frequency_subday = 1, @frequency_subday_interval = 5, @active_start_time_of_day = 500, @active_end_time_of_day = 235959, @active_start_date = 0, @active_end_date = 0, @job_login = null, @job_password = null, @publisher_security_mode = 1

  10:  

  11:  

  12: use [TestFileStream]

  13: exec sp_addmergearticle 

  14: @publication = N'FileStreamPublication', 

  15: @article = N'DocumentStore', 

  16: @source_owner = N'dbo', 

  17: @source_object = 

  18: N'DocumentStore', 

  19: @type = N'table', 

  20: @description = N'', 

  21: @creation_script = null, 

  22: @pre_creation_cmd = N'drop', 

  23: @schema_option = 0x000000010C034FD1, 

  24: @identityrangemanagementoption = N'auto', 

  25: @pub_identity_range = 10000, 

  26: @identity_range = 1000, 

  27: @threshold = 80, 

  28: @destination_owner = N'dbo', 

  29: @force_reinit_subscription = 1, 

  30: @column_tracking = N'false', 

  31: @subset_filterclause = null, 

  32: @vertical_partition = N'false', 

  33: @verify_resolver_signature = 1, 

  34: @allow_interactive_resolver = N'false', 

  35: @fast_multicol_updateproc = N'true', 

  36: @check_permissions = 0, 

  37: @subscriber_upload_options = 0, 

  38: @delete_tracking = N'true', 

  39: @compensate_for_errors = N'false', 

  40: @stream_blob_columns = N'true', 

  41: @partition_options = 0

  42: GO

In the sp_addmergearticle, there are two options which needs to be configured for Replicating a FileStream object to the subscriber.

@schema_option = 0x000000010C034FD1

@stream_blob_columns = N’true’

For a FileStream object to be replicated without issues to the client, the @schema_option should be set accordingly. There are two specific hex codes, which needs to be present in the @schema_option.

0x100000000  –> Use this option to replicate the FILESTREAM attribute if it is specified on varbinary(max) columns. Do not specify this option if you are replicating tables to SQL Server 2005 Subscribers. Replicating tables that have FILESTREAM columns to SQL Server 2000 Subscribers is not supported, regardless of how this schema option is set.

0x800000000  –> Set this option to store FILESTREAM data on its own filegroup at the Subscriber. If this option is not set, FILESTREAM data is stored on the default filegroup. Replication does not create filegroups; therefore, if you set this option, you must create the filegroup before you apply the snapshot at the Subscriber.

The second hex value, which needs to be added to the @schema_option indicates where the FileStream objects would be created. If the second option is not specified, then SQL would try to add the FileStream objects to the the default FileStream filegroup (this is different from the default data filegroup i.e. primary filegroup). If there is no default file stream filegroup added to the subscriber database, replication would fail with the above messages. This also means that if we are trying to create the database from the “Create Subscription wizard” we need to make sure we also specify a default FileStream Filegroup during DB creation.

The second parameter @stream_blob_columns is used in cases where the file stream objects are expected to be more than 2 GB in size. If this parameter is not set to true, then replication would fail whenever the FileStream object is grater than 2 GB.

In the customer scenario, since he was creating a new DB through the “Create Subscription Wizard” and not specifying the FileStream filegroups, replication was failing with the above message.

The above used @schema_option can be broken down as follows.

0x100000000 
0x8000000  
0x4000000   
0x20000

0x10000 
0x4000

0x800

0x400

0x200

0x100

0x80

0x40

0x10

0x01

Details of the various hex codes can be found here.

Advertisement

How to validate if SQL FILESTREAM document was modified outside SQL

Recently, I came across a question about validating if the SQL FileStream document has been modified outside of SQL or not.

My immediate reaction was one of shock. If a FileStream document has been modified outside of SQL, then we have bigger problems at hand. We have serious issues with SQL Server security and data consistency is at risk.

FileStream storage, is not supposed to be modified outside of SQL. Though we can have Win32 streaming of these documents(documents created on the File System), but these has to be done in the context of a SQL Transaction and only within the context of a SQL Transaction. FileStream was not created for external modifications.

If the documents are modified outside of SQL, then the database is risk. For example, if we by mistake delete the files from the file system, SQL Would treat it as a database corruption and would throw errors during the next Check DB run.

Msg 7904, Level 16, State 2, Line 1
Table error: Cannot find the FILESTREAM file "00000020-00000146-000a" for column ID 2 (column directory ID d35bf83a-99c0-4a7d-ac24-e9f7cf15a54b) in object ID 101575400, index ID 1, partition ID 72057594038910976, page ID (1:170), slot ID 0.
There are 4 rows in 1 pages for object "DocumentStore".
CHECKDB found 0 allocation errors and 3 consistency errors in table ‘DocumentStore’ (object ID 101575400).

But the question is, how do we figure out if a File Stream document was modified outside of SQL or not.

Short answer is there is NO way to do it. There is no inbuilt function or mechanism to figure that. The simple reason why this cannot be done is because the path to store the File Stream data is not exposed for Direct Access. Also the file names which are visible in the File System are not the same which SQL Server maintains. They are different and there is no way to correlate the SQL Names with the File system Name.

As a SQL Server DBA, you need to make sure that the File Stream files are not get modified outside of SQL.