I am just a medium, SQL Server the Goal

Tag Archives: SQL Server

SQL Server Record Structures–Part 1

Recently while talking about the SQL Server Storage engine at one of the customer workshops, I was asked about how data is actually stored in the tables and indexes. With the exception of Column Store Indexes (*introduced in SQL 2012), data in SQL Server is stored in rows.

A data row in SQL Server can be stored in one of 3 formats, depending on what SQL Server feature is enabled in the database.

  • Regular Data Record: The regular Data row structure which have been in use since the early days of SQL Server.
  • Compression Row Structure : When Row/Page compression is enabled on the Table.
  • Sparse Column Row Structure: When Sparse Columns are defined in the table.

Index Records are in two formats

  • Leaf record structure
  • Non-Leaf record structure

Additionally SQL Server has a separate type of record structure to store LOB data. LOB data can either be stored as

  • Off Row Data
  • Row-overflow data

In addition to the above mentioned data structures, SQL also has a Versioning Records (for all data, Index, text types) when row versioning is enabled or as used by some internal SQL features.

In the first part of the blog series I would be talking about the regular data row structure as has been used since the early days of SQL Server, the other record structures for compression and or sparse columns would be discussed in later posts. The Data records is part of a Heap Table (tables with no clustered index) or the leaf level of a clustered index. A Data Record is composed from the values of all the columns in the table. Another important point to keep in mind is that in SQL Server, the terms Row/Record/Slots are all synonymous and can be used interchangeably.


The record can be divided into 8 sections as depicted in the picture above.

  1. 2 Bytes Record Status (Record header) : Currently only 9 bits of this is being used. The header contains information about what kind of record it is. Whether it is versioned or ghosted record and so on. For more information on these 2 bytes refer, Microsoft® SQL Server® 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp , Conor Cunningham , Adam Machanic
  2. The second 2 bytes (Length of Fixed Length Columns) indicate the total length of fixed length columns (int, date, float, money char etc.)
  3. The 3rd section of N bytes is the actual storage for the Fixed length columns. Yes, you got it right, SQL Server stores all the fixed length columns together (irrespective of where they are defined in the table).
  4. 2 Bytes for Number of Columns – 2 bytes are used to store the number of columns in the record. In the default scenario (no sparse columns) a table in SQL 2008 can have up to 1024 column, which means at least 10 bits would be required to store this information.
  5. Null Bitmap (1 bit per column in the table): 1 bit would be used per column, to indicate whether the column can have a null value or not. This is done for all the columns in the table.
  6. 2 bytes – Number of variable length columns in the table: A count of the variable length columns in the table.
  7. Variable length Column offset (2 bytes per variable length column): This offset stores the ending offset for each variable length column in the table. This allows SQL Server to efficiently calculate the start,end and length of each variable length columns. 2 bytes is required because the column offset can be anywhere on the 8KB page.
  8. N bytes – For the variable length columns storage: Storage for the column values.

To understand this further, lets take few examples. In all the examples below, the following DBCC PAGE/DBCC TRACEON commands would be used to display the page/record structure.

Example 1: Table With All fixed length columns

Consider a SQL Table with the following structure

create table FixedLenghtColumns


col1 int,

col2 float, 

col3 datetime, 

col4 char(25)



-- Insert a record into the table

Insert into FixedLenghtColumns values (1, 1.0001, getdate(), 'Fixed Length Columns')


-- DBCC IND Command to display the allocations for this table

DBCC IND('Database Name','FixedLenghtColumns',-1)


-- DBCC PAGE Command to display the Page structure

DBCC TRACEON (3604,-1)

DBCC PAGE (10, 1, 276, 3)

Lets look at the page and record structure from the DBCC Page Output.


The fields marked in Red boxes are important.

pminlen – Indicates the minimum length of a record in the table. This table has 4 columns with a total length of 45 bytes. Add to it the Row header and 2 bytes for the length of the fixed length columns.

m_slotCnt: This field indicate the number of slots or records on the page.

Record Size/Record Attributes: This indicate the actual size of the this record and what are the attributes that this record has. For example this record contains a NULL_BITMAP field.

The “Slot 0 Offset 0x60 Length 52” section indicate that this is the first record on the page, and the record starts at offset 0x60 (decimal 96). Which is valid since the Page Header occupies the first 96 bytes of a SQL Server Page. Please note all offsets start at 0.

Now lets see why the record occupies 52 bytes. The first 4 bytes of the record are fixed for the record header and the length of fixed length columns. Next the total length of the fixed length columns is 45 bytes. Then we need 2 bytes for the number of columns and 1 byte for the Null Bitmap (4 bits for the four columns, but since it all byte allocation, we need 1 byte). This is visible in the record details we see in the DBCC PAGE output.


As can be seen, the first column value starts at offset 0x4 (decimal-4). Which is after the first 4 bytes for the Row header and the length of the fixed length columns. The other column values follow the similar pattern.

Example 2: Tables with mix of variable and fixed length columns

Lets consider another example, where the table contains a mix of fixed and variable length data types.

CREATE TABLE MixDataTypeColumns


Col1 int NOT NULL,

Col2 char(25) NOT NULL,

Col3 varchar(60) NULL,

Col4 money NOT NULL,

Col5 varchar(20) NOT NULL



Insert into MixDataTypeColumns values (10, 'Sourabh Kumar Agarwal', 'This is a test', 3764.846, 'last column')


DBCC IND('DatabaseName','MixDataTypeColumns',-1)


DBCC PAGE(10,1,274,3)

The output of the DBCC PAGE indicates the following…


The Record length here is 75 which can be summed up as follows

2 bytes Record Header+ 2 bytes for the length of fixed length data types columns + 37 bytes for the fixed length data types columns + 2 Bytes of No of Columns + 1 bytes (*5 bits) for NULL_BITMAP + 2 bytes for number of Variable length data type columns + 2*2 bytes for the Variable length column offset + 25 bytes used for the variable length column values.

Also as visible from the column offset information from the snapshot above, we can see that the fixed length columns are stored first (notice the column offsets) and then the variable length columns are stored, Which is

  • Col1 (first fixed length Column) – 0x4 – decimal 8
  • Col2 (second fixed length Column) – 0x8 – decimal 8
  • Col4 (last fixed length Column) – 0x21 – decimal 33
  • Col3 (first variable length Column) – 0x32 – decimal 50
  • Col5 (first fixed length Column) – 0x40 – decimal 64

In the next post, I will talk about 3 special cases involving versioning, ghost records and row forwarding.


SQL Server Memory–Rise of the Clerks – Part 1

Pervious Posts in the Series

SQL Server Memory – Underground

SQL Server Memory – The Evolution

SQL Server 2005 introduced many changes to the Memory Architecture. Majority of these architectural changes continued till SQL Server 2008 R2. In this post, I will try to cover a high level overview of architectural changes introduced in SQL 2005.

The New Architecture

The new memory architecture consists of 4 major components

  1. Memory Nodes
  2. Memory Clerks
  3. Memory objects
  4. Memory Broker

Slava Oak’s has a blog post explaining these components in details. Most of what follows is an attempt at simplifying what Slava mentions his blog.

Memory Node: Memory nodes are internal SQLOS objects. The number of memory nodes depends on the NUMA configuration of the server. Memory nodes can be thought of as a large pool of Memory, which different components (Clerks) can allocate memory from. Memory Nodes expose a set of Allocators which are used by all memory Clerks to allocate the desired amount of memory from the Node.

  1. Page Allocators: There are 4 primary types of page allocators exposed by the memory node. These allocators are used to allocate a set of 8KB pages. Remember that SQL Server page granularity is 8KB and any memory allocated by these page allocators would always be in multiple of 8KB.
    • Single Page Allocator: As the name suggests, this allocator is used to allocate one 8KB page at a time.
    • Multi Page Allocator: Allocates Multiple contigous 8KB Pages
    • Large Page Allocators: These are used to allocate Large Pages. This feature is only present in IA64 or X64 editions of SQL, running on servers with more than 8GB of memory. Trace Flag 834 has to be enabled in order for SQL to use Large Pages. On a X64 server the Large Page granularity is 2MB, while on IA64 its 16MB.
    • Reserved Pages: Is special purpose allocator, which allocates a set of pages for emergencies. Typically the “Fail Safe” memory for SQL.
  2. Virtual Memory Allocator: Uses Windows VirtualAlloc API’s, when SQL needs to allocate contigous memory which does not necessarily fall in the 8K page boundary.
  3. Shared Memory Allocator: Uses windows file mapping API’s and provides LPC (shared memory) capabilities.

DBCC memory status gives some information about the Memory Nodes, but they are NOT exactly memory node but CPU nodes.


Locked Pages allocated would only show up when SQL Server has been configured to use Lock Pages in memory.

Memory Clerks: Clerks are consumers of memory in SQL. In general there are 4 kinds of clerks, generic, cache store, object store, user store. The clerks make use of the memory node allocators to allocate memory. SQL has a vast number of Clerks predefined, and each of these clerks corresponds to the type of allocation which needs to be done. SQL exposes the clerk information using the “sys.dm_os_memory_clerks” DMV. This information is also exposed through the DBCC MEMORYSTATUS. In the memory status output one can see the clerks being listed multiple times. This is because of the fact that the clerks are created per NUMA node. So if we have 4 NUMA nodes on the machine, we could see the clerks being reported 4 times.

Memory Objects: Memory objects are the memory allocations made through the clerks. There are 3 kinds of memory objects which are supported by SQL. Memory objects can be viewed using the “sys.dm_os_memory_objects” DMV.

  1. Variable Memory Object: Supports variable size memory allocations.
  2. Mark/Shrink Objects: This allocations are typically done in two stages. During the first stage the allocations grows in size (more and more memory are added as part of this allocation) and during the second phase the memory is de-allocated. As Slava points out, these type of allocations are very useful in scenarios like compilation or execution.
  3. Fixed Size Memory objects: Supports memory allocations which are fixed in size.

Memory Broker: Memory broker can be though of as the Ring Master in a Circus. The purpose of Memory Broker is to provide a centralized mechanism to distribute memory or control the allocations made by each component in SQL Server. Memory Broker monitors the demand consumption of memory by each component and then based on the information collected, it calculates and optimal value of memory for each of these components. This information is then broadcast to each of the components, which grow or shrink their usage as required. Memory broker is exposed through the DMV “sys.dm_os_ring_buffers” where the Ring_buffer_type is “RING_BUFFER_MEMORY_BROKER”.

To tie these pieces together. 

In order for the Clerks to allocated Memory, they have to create a memory object using one of the Allocators exposed by the Memory Node. These allocations (memory objects) could be of any type as mentioned above. This entire allocation and de-allocation is monitored by the memory broker, which redirects the clerks to grow/shrink their memory usage as required.


In the next post, I will talk about the memory clerks in more details.

How to validate if SQL FILESTREAM document was modified outside SQL

Recently, I came across a question about validating if the SQL FileStream document has been modified outside of SQL or not.

My immediate reaction was one of shock. If a FileStream document has been modified outside of SQL, then we have bigger problems at hand. We have serious issues with SQL Server security and data consistency is at risk.

FileStream storage, is not supposed to be modified outside of SQL. Though we can have Win32 streaming of these documents(documents created on the File System), but these has to be done in the context of a SQL Transaction and only within the context of a SQL Transaction. FileStream was not created for external modifications.

If the documents are modified outside of SQL, then the database is risk. For example, if we by mistake delete the files from the file system, SQL Would treat it as a database corruption and would throw errors during the next Check DB run.

Msg 7904, Level 16, State 2, Line 1
Table error: Cannot find the FILESTREAM file "00000020-00000146-000a" for column ID 2 (column directory ID d35bf83a-99c0-4a7d-ac24-e9f7cf15a54b) in object ID 101575400, index ID 1, partition ID 72057594038910976, page ID (1:170), slot ID 0.
There are 4 rows in 1 pages for object "DocumentStore".
CHECKDB found 0 allocation errors and 3 consistency errors in table ‘DocumentStore’ (object ID 101575400).

But the question is, how do we figure out if a File Stream document was modified outside of SQL or not.

Short answer is there is NO way to do it. There is no inbuilt function or mechanism to figure that. The simple reason why this cannot be done is because the path to store the File Stream data is not exposed for Direct Access. Also the file names which are visible in the File System are not the same which SQL Server maintains. They are different and there is no way to correlate the SQL Names with the File system Name.

As a SQL Server DBA, you need to make sure that the File Stream files are not get modified outside of SQL.

Did you Know- You can perform multiple Insert, update or delete operations as a Singleton operation using the SQL 2008 MERGE operator

The Merge statement in SQL Server 2008 can be used to perform multiple inserts, update and delete operations. Assume for example, in your organization you have a Source Table and a destination table. Everyday at the end of the business, data from the Source table is appended to the Target table. With SQL 2000/SQL 2005, the only way of doing this optimally (assuming that the tables are very large) was to find the Delta of all the changes in the source table and then update the same in the target table. The problem was that you would have to write your own logic or use other SQL Features like SSIS or CDC for finding the delta and then updating the target table.

With the Merge statement you can do the same without much hassles. We can also use the Merge statement in an SSIS package or a job to automate the entire sequence.

Using the MERGE Statement

Syntax for the MERGE statement is simple to understand. First we have the Target table specified by the INTO clause, followed by the Source Table specified with the USING clause. Then we have the Join parameters. The join parameters are important, because this is what will govern which records are to be inserted or deleted or updated in the target table. Then we have the WHEN clause to define the criteria for INSERT, UPDATE and DELETE.

[ TOP ( expression ) [ PERCENT ] ]
[ INTO ] target_table [ WITH ( <merge_hint> ) ] [ [ AS ] table_alias]
USING <table_source>
ON <merge_search_condition>
[ WHEN MATCHED [ AND <clause_search_condition> ]
THEN <merge_matched> ]
[ WHEN NOT MATCHED [ BY TARGET ] [ AND <clause_search_condition> ]
THEN <merge_not_matched> ]
[ WHEN NOT MATCHED BY SOURCE [ AND <clause_search_condition> ]
THEN <merge_matched> ]
[ <output_clause> ]
[ OPTION ( <query_hint> [ ,…n ] ) ]

For example, assume that your organization maintains 2 tables, one for daily purchases and another for Reporting which has the last purchase date and the total amount of purchases made by each customer.

Use tempdb

Create table PurchaseReporting
CustomerID int,
LastPurchaseDate Datetime2,
Amount Smallmoney,

Create Table DailyPurchases
CustomerID int,
productID int,
Amount Money

Insert into PurchaseReporting values (1, GETDATE(), $1000),
(2,getdate(), $2000),
(3,getdate(), $500),
(4, GETDATE(), $700)

— In the Reporting table we have 4 records for 4 different customer to indicate
— the last purchase date and the total amount of purchased goods.

Insert into DailyPurchases values(1, 2, $100),

— In the Daily purchases table we have the 7 records to indicate the products purchased
— by different customers.

select * from PurchaseReporting
Select * from DailyPurchases


— Now lets write our Merge statement to update all the changes to the Reporting Table

MERGE PurchaseReporting as SR
(Select CustomerID, getdate(), Sum(Amount) from
    GROUP BY CustomerID) as DS(CustomerID, PurchaseDate, TotalAmount)
On SR.CustomerId = DS.CustomerID
UPDATE SET SR.LastPurchaseDate = DS.PurchaseDate, SR.Amount = (SR.Amount + DS.TotalAmount)
INSERT values (DS.CustomerID, DS.PurchaseDate, DS.TotalAmount);

–After execution lets check the values of the Reporting Table Again

select * from PurchaseReporting


Notice the changes in the LastPurchaseDate and the Amount columns for the customer (1,2,3,4) and a new record with CustomerID 5 being inserted into the table.

Did you know: Trace Flag 1118 does not just affect your TempDB

A couple of days back, somebody asked if trace flag 1118 affect only the TempDB. My initial reaction was yes its only for TempDB. But then after further research, I figured out that it is not only for TempDB, but affects all user databases. The reason why we do not observe this change or why we don’t talk much about this is the fact that we do not create and drop tables on the user database at the same frequency as we do it for TempDB.

Since the schema of a user database does not change much often ( if it does in your environment, then you need to re-evaluate your design) the changes introduced by the TF does not pose any problems with DB growth or concurrency on the GAM and SGAM pages.

Changes Introduced by the TF 1118

Before we get into the details of the changes introduced, lets first talk about the default behavior in SQL. As an example we will try to create tables in both TempDB and a user DB (AdventureWorks).

When you create a table in SQL Server using the CREATE TABLE command, SQL will just create the meta data for the table, there is no allocation as of now. As seen below

use tempdb

create table TempTable (a int, b char(8000))

sp_spaceused ‘TempTable’


As the output for sp_spaceused indicates, there is no allocation done for the table as off now. The page allocation happens when we insert the FIRST record in the table. SQL Server will allocate 1 IAM page and 1 Database page (assuming the table has no indexes and also that we just need 1 page for the current set of records). SQL Server will allocate this one IAM page and 1 Data page from a mixed extent.

Insert into TempTable values (10, ‘AAA’)

sp_spaceused ‘TempTable’


As we can see SQL Server has allocation one Data Page and 1 Index Page (IAM) page. Under the default behavior SQL will try to Allocate the first 8 Data pages or index pages using Mixed Allocation. For any further pages, SQL will allocate a uniform extent.

In the example above we have made sure that 1 record occupies 1 DB page. If we add 8 records, as per the behavior SQL will only allocate 8 pages (all coming from Mixed Extents) and when we insert the 9th record, we would see a uniform extent being allocated.

Insert into TempTable values (10, ‘AAA’)
go 7;

Notice the go syntax used above, this is a special case where the Insert command is executed 7 times. Now lets check the output for sp_spaceused.


Notice that we have 8 DB pages and 1 IAM page. Now lets try to insert another record in the table.


Notice the increase in the unused space and the reserved space in the Table. We have 64KB allocated at the same time. Indicating the usage of a Uniform Extent this time.

That was the default behavior, what does TF 1118 change?

When TF 1118 is enabled, SQL will not perform any allocations from a Mixed Extent. So when you create a table, SQL will directly allocate a new uniform extent for the table. As shown below..

DBCC TRACEON (1118,-1)

create table TempTable2 (a int, b char(8000))

Insert into TempTable2 values (10, ‘AAA’)
sp_spaceused ‘TempTable2’


The same will happen if we create a Table in any of the user Database.

use AdventureWorks
create table TempTable2 (a int, b char(8000))
Insert into TempTable2 values (10, ‘AAA’)
sp_spaceused ‘TempTable2’


The above output indicates that all allocation are not Uniform Extent Allocations.

Note: Once again since the rate of creation of Tables in a USER DB is almost zero, this TF will not see any impact of the TF on user DB’s.

P.S. Thank you Parikshit for helping out with the scripts.