Previous Posts in this series.
- SQL Server Record Structures-Part 1
- SQL Server Record Structures-Part 2
- SQL Server Record Structures-Part 3
In the pervious posts I have discussed the structure of the data record in SQL (without compression or sparse columns). We also discussed a few special cases which affect how the record is stored in SQL. In the ensuing post, I would be talking about record structures when data compression is enabled on the table/index.
Data compression was first introduced in SQL with SQL Server 2005, in the form of VarDecimal Storage, where in any column with Decimal/Numeric data type was converted into VarDecimal storage in order to utilize only the required amount of bytes. This obviously was subject to enabling VarDecimal Storage on the DB and the table.
With SQL Server 2008, two new compression techniques were introduced. VarDecimal Storage was deprecated in this version of SQL as row compression achieves the almost the same results.
When Row compression is enabled on the table/Index all columns in the table are converted to use variable length storage, using only enough bytes to store the data. For example if a column is declared as Char(50) and has the following string as data “SQLUninterrupted” instead of using the 50 bytes of storage( as would be the case when no compression is enabled), SQL only uses 16 bytes when compression is enabled.
Some points to keep in mind when enabling Row Compression on a table/Index
- Compression does not change the Max Size limits on the record or Index keys.
- Compression cannot be implemented if the maximum size of the record, plus the Compression Information overhead exceed 8060 bytes.
- Compression can be enabled on individual partitions of a Partitioned table/Index.
- Changing the compression setting of a heap table requires a rebuilt of all the non-clustered indexes on the table.
When Page Compression is enabled on a Table/Index it takes a 3 level approach. First row compression is enabled on the page, followed by a Prefix and Dictionary compression. Details of how Prefix and dictionary compression work is available in Book Online topic “Page Compression Implementation”. When Page Compression is enabled, a special record called the Compression Information Record is added to the page. This CI record stores the Prefix/Dictionary compression information.
Some points to keep in mind when enabling Page Compression on a table/index.
- Page Compression only kicks in when the Page is full and a new row is being inserted. If enough space can be saved to accommodate the new record and the compression information on the page, the page will be compressed. Otherwise it would not be compressed.
- Enabling Page Compression on a heap table does not enforce compression on the pages until the heap is rebuilt.
- New pages being added to the heap are also not compressed until the heap is rebuilt.
- Non-Leaf pages of an index cannot be page compressed. This is to avoid the overhead of uncompressing the record during index operations.
Compression Record Structure
When compression is enabled on a table, the data records are stored in a new format commonly called the CD (Column Descriptor) format. This format is entirely different from the regular record structure used when there is no compression or no sparse columns defined in the table. The Structure of the record takes the following format.
Record Header Region
This is a 1 byte section with the following information
- Bit 0 – Type of Record. This is set to 1 for CD format Records
- Bit 1 – Indicates if the record has versioning information.
- Bit 2-4 – Indicates the formation stored in the row. Details of these bits can be found in the Microsoft® SQL Server® 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp , Conor Cunningham , Adam Machanic
- Bit 6 – Indicates if the record contains a long data region or not.
- Bit 7 – Not used.
Column Descriptor Region
This region is composed of two parts, the first part is either 1 byte or 2 bytes and indicated the number of columns in the table. The second part is 4 bit (per column) descriptor which indicates the length of the column. The 4 bits can be used to represented 16 different values, of which only 13 are used with SQL 2008. For details of the descriptors refer the SQL Server 2008 Internals book mentioned above.
Short Data Region
Contains columns which are less than equal to 8 bytes in length. Since the length of the columns are not directly stored in this region, in tables with a lot of columns, it can be expensive to access the last column, as this would require processing all the previous columns (length of each column) and then computing the start and end of the desired column. To mitigate this, the columns are divided into clusters of 30 columns each. A Short Data Cluster Array(byte array) is then created in this region to store the length of each cluster. Since there are only 30 columns in each cluster (with each column being max of 8 bytes), a single cluster can have a max length 240 bytes.
Long Data Region
All columns with data length grater than 8 bytes are stored in the long data region. Since the length of these columns is not stored in the CD regions (each CD descriptor is only 4 bits, whereas the column length can be anything), the Long Data region requires a column offset array to correctly locate the column data in the Long Data Region.
Again, in order to reduce the cost of locating a column value in this region, we have a Long Data Cluster array, which stores the number of columns in each clusters (again each cluster represents up to 30 columns).
Note: This illustration has been taken from the Microsoft® SQL Server® 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp , Conor Cunningham , Adam Machanic
Special Information Section
Contains optional information about forwarding/back pointer or the Versioning information.
Row Compression Example
I have created a sample table with 10 columns of varying size and then enabled Row Compression on the Table. I then inserted two records in the table. Now Lets look at the record structure from the output of the DBCC Page command.
As indicated by the DBCC PAGE output, the total record length is only 128 bytes as compared to 362 bytes that might have been required to store the same record in the uncompressed format.
Next we also have the CD Array displayed in the output. Since there are 10 Columns in the table, there are 10 entries in the CD array.
As can be seen, the columns Col6, Col8 and Col 9 have values longer than 8 bytes and hence are marked as Long in the CD array. Also notice that since there are only 10 columns, all the columns are part of the same cluster (cluster 0).
Page Compression Example
I used a similar table structure as above, to demonstrate Page Compression. As I mentioned earlier, Page compression does not kick in until the Page is full and a new record has to be inserted into the table.
I created the following tabled and enabled Page Compression on the Table.
Next I inserted 64 records into the table using the following script
At this point, there are two data pages in the table with row compression only. So why did the Page Compression not take effect?
The reason is that on a heap, PAGE Compression would not take effect until the heap table has been rebuilt. So lets rebuild the table..
Now there is only one Page in the table with Page Compression enabled..
When Page Compressed, SQL Server adds a new record to the top of the page (right after the Header Info) called the Compression Information record. Additionally SQL server adds information in the page header to indicate that the page is “Page Compressed”. If the value of m_typeFlagBits = 0×80, then the page is Page Compressed.
In the next part of this blog series I will talk about the Compression Information record structure.