SQLUNINTERRUPTED

I am just a medium, SQL Server the Goal

Tag Archives: ColumnStore Index

SQL Server 2012 — ColumnStore Index

In the past, if you have been plagued with Slow performance of your data warehouse queries, there is some good news for you. With SQL Server 2012, we can now create ColumnStore Indexes, which would improve your query performance significantly. ColumnStore indexes are targeted for queries which use a Aggregations over columns or a set of columns.

Traditional SQL Server Approach

SQL Server has traditionally been using a row store based storage. Under this approach the complete data row is stored on a Data Page, and is read or written to as a complete unit. As shown below, the Row is Stored on the page

image

In the traditional approach, lets say if we had a table, with the given definition

create table t1
(
    a int,
    b int,
    c datetime,
    d varchar(300)
)

Assume we are running the following query against the table and that the table has about 1 billion records on approximately 40 million pages.

select SUM(a),AVG(b), MAX(c) from t1

In this case, we just need to do a sum of the values of column a. Given the row store approach, we would need to read the entire row (all the columns) to execute this query, which means we would need to read all the 40 million pages for this query and then perform the aggregation operation.

How ColumnStore Index Helps

When we create a column store index on the table, it changes the way how data is organized. Data for each column which is included in the ColumnStore index is arranged separately. For example if we created a ColumnStore index on the above table on the columns (a, b and c). The columns would be organized independent of each other on different set of pages. as shown below.

image

Additionally, ColumnStore uses page compression by default. Most of the column values are similar if not same, and hence can greatly benefit from compression.

Now, with this arrangement, if we run the above mentioned query, we would only need to read pages for Column a, which in the above case would be only about ~300K(assuming 40% compression) of pages to read and sum the values of a, then another ~300K(assuming 40% compression) pages for the average of b and finally ~500(assuming 50% compression) pages to calculate the Max of c.

As mentioned, this type of organization helps queries where we are returning only some of the columns from a table. For queries where we need to return all the columns in a row, Row-Store approach works better. Typically, OLTP workloads benefit from row storage organization of data, while warehouse queries would benefit from ColumnStore index.

Restrictions on ColumnStore Indexes

The MSDN Article describes the restrictions or limitations to creating ColumnStore indexes. One thing to keep in mind, that when u create a ColumnStore index on a Table, you cannot Update the table. In Order to update the table, the quickest way would be to drop the column store index and then perform your insert, update, deletes. Once you are done with the DML’s, the ColumnStore Index can be recreated.

Example

I create two tables with the same schema but one with only a ColumnStore Index and another with a Clustered Index and a non clustered index. Both the table have about 7.7 million records.

This tables have been created on the AdventureWorks2012 sample database, which can be downloaded from here.

TableWithoutColumnStoreIndex

   1: use AdventureWorks2012

   2: go

   3:  

   4:  

   5: select * into dbo.TableWithoutColumnStoreIndex from Sales.SalesOrderDetail

   6: go

   7:  

   8: declare @count int =0

   9: while @count < 6

  10: begin

  11:     insert into dbo.TableWithoutColumnStoreIndex 

  12:     ([SalesOrderID],[CarrierTrackingNumber],[OrderQty],[ProductID],[SpecialOfferID],[UnitPrice],

  13:     [UnitPriceDiscount],[LineTotal],rowguid, [ModifiedDate])

  14:         select [SalesOrderID],[CarrierTrackingNumber],[OrderQty],[ProductID],[SpecialOfferID],[UnitPrice],

  15:     [UnitPriceDiscount],[LineTotal],NEWID(), [ModifiedDate] from dbo.TableWithoutColumnStoreIndex

  16:  

  17:     set @count=@count+1

  18: end

  19:  

  20:  

  21: /****** Object:  Index [IDX_Clustered]    Script Date: 2/1/2012 3:01:42 AM ******/

  22: CREATE CLUSTERED INDEX [IDX_Clustered] ON [dbo].[TableWithoutColumnStoreIndex]

  23: (

  24:     [SalesOrderDetailID] ASC

  25: )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

  26: GO

  27:  

  28: /****** Object:  Index [IDX_NonClustered]    Script Date: 2/1/2012 3:01:58 AM ******/

  29: CREATE NONCLUSTERED INDEX [IDX_NonClustered] ON [dbo].[TableWithoutColumnStoreIndex]

  30: (

  31:     [SalesOrderID] ASC,

  32:     [ProductID] ASC

  33: )

  34: INCLUDE (     [ModifiedDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

  35: GO

  36:  

  37:  

TableWithColumnStoreIndex

   1: use AdventureWorks2012

   2: go

   3:  

   4: select * into dbo.TableWithColumnStoreIndex from Sales.SalesOrderDetail where 1=2

   5: go

   6:  

   7: Alter Table dbo.TableWithColumnStoreIndex Alter Column LineTotal numeric(18,6)

   8: go

   9:  

  10: insert into dbo.TableWithColumnStoreIndex 

  11: ([SalesOrderID],[CarrierTrackingNumber],[OrderQty],[ProductID],[SpecialOfferID],[UnitPrice],

  12: [UnitPriceDiscount],[LineTotal],rowguid, [ModifiedDate])

  13:     select [SalesOrderID],[CarrierTrackingNumber],[OrderQty],[ProductID],[SpecialOfferID],[UnitPrice],

  14: [UnitPriceDiscount],[LineTotal],rowguid, [ModifiedDate] from Sales.SalesOrderDetail

  15: go

  16:  

  17:  

  18: declare @count int =0

  19: while @count < 6

  20: begin

  21:     insert into dbo.TableWithColumnStoreIndex 

  22:     ([SalesOrderID],[CarrierTrackingNumber],[OrderQty],[ProductID],[SpecialOfferID],[UnitPrice],

  23:     [UnitPriceDiscount],[LineTotal],rowguid, [ModifiedDate])

  24:         select [SalesOrderID],[CarrierTrackingNumber],[OrderQty],[ProductID],[SpecialOfferID],[UnitPrice],

  25:     [UnitPriceDiscount],[LineTotal],NEWID(), [ModifiedDate] from dbo.TableWithColumnStoreIndex

  26:     set @count=@count+1

  27: end

  28:  

  29:  

  30: -- Create a Column Store Index on the Table

  31: Create NonClustered ColumnStore Index IDX_ColumnStore on dbo.TableWithColumnStoreIndex

  32: (    [SalesOrderID],

  33:     [SalesOrderDetailID],

  34:     [CarrierTrackingNumber],

  35:     [OrderQty],

  36:     [ProductID],

  37:     [SpecialOfferID],

  38:     [UnitPrice],

  39:     [UnitPriceDiscount],

  40:     [LineTotal],

  41:     [ModifiedDate]

  42: )

I executed the below mentioned query on the two table to compare the performance. In the Query below, I am just substituting the Name of the tables([TableWithColumnStoreIndex], or  [TableWithoutColumnStoreIndex]) While executing.

   1: select

   2: ( ISNull(P.FirstName,'') + ' ' + ISNull(P.MiddleName,'') + ' ' + IsNull(P.LastName,'')) As CustomerName,

   3: SOH.OrderDate, SOH.DueDate, SOH.ShipDate,SOH.TotalDue,

   4: sum(TWC.OrderQty) As TotalOrderQuantity ,avg(TWC.UnitPrice) As AvgUnitPrice,

   5: Avg(TWC.UnitPriceDiscount) as AvgDiscountOnUnitPrice

   6: from 

   7: [dbo].[TableWithoutColumnStoreIndex] TWC 

   8: join Sales.SalesOrderHeader SOH on TWC.SalesOrderID = SOH.SalesOrderID

   9: join Sales.Customer C on SOH.CustomerID = C.CustomerID

  10: join Person.Person P on P.BusinessEntityID = C.PersonID

  11: where TWC.UnitPriceDiscount <> 0 and TWC.OrderQty > 500

  12: group by 

  13: ( ISNull(P.FirstName,'') + ' ' + ISNull(P.MiddleName,'') + ' ' + IsNull(P.LastName,'')),

  14: SOH.OrderDate, SOH.DueDate, SOH.ShipDate,SOH.TotalDue

Query Performance Against the Table without ColumnStore Index

When executed against the Table without ColumnStore index, the query table takes about 1248 millisecond.

image

Query Performance Against the Table with ColumnStore Index

When executed against the Table with ColumnStore Index, the query executes in about 218 milliseconds. A 5x performance improvement.

image