Session: Clustered Columnstore Index Deep Dive
Track: Data and Analytics
The Columnstore Indexes technology was released in 2012, but even though it’s been out there for some years, there are so many interesting capabilities added with SQL Server 2016. The presenters’ focus on this session was on the new features in SQL Server 2016 and to provide insights to achieve high query performance.
Throughout the session, Sunil and Sudarshan presented various demonstrations of the capabilities and advantages of the new features with 5 live demos; which were very smooth, clear and well presented. Providing enough examples for the information that was communicated.
Let’s start with the basics, why would you use columnstore index? Instead of having data only stored in rows, it will be packaged in columns as well – these are called Row Groups (read-only, compressed row groups that contain 100.000 rows).
The major advantage is the ability to read only the data that is needed. This is achieved by picking only the columns required in the query (column elimination) and if there is any conditions to limit the data, the ability to get the row groups that meet the conditions (row group elimination).
The row groups compress differently depending on the type of data, hence the different sizes in the boxes that represent the row groups above.
The Clustered Columnstore Index technology was not created by Microsoft, it already exists in other databases like Oracle and SAP HANA. However, in SQL Server 2016 Microsoft introduced a key element that makes it different from others:
Batch Mode Processing: Provides the ability to fetch batches of rows (from 64 to 900 rows – limited by the size of the buffer – 64k) instead of reading the data in a sequential manner. This feature will decrease the number of CPU cycles significantly.
On average, not having the Batch Mode activated will make the query 6x time slower than when this mode is activated.
In addition, 3 functions were added to be used in Batch Mode:
Aggregate Pushdown: The values of the qualifying rows requested are aggregate in the SCAN node instead of pushing them to the AGGREGATE node to improve the performance of the aggregate computation.
Example: See the outcome of this query: SELECT SUM(Amount) FROM <columnstore>
String Predicate Pushdown: Similar to the Aggregate Pushdown, this improvement occurs by pushing down the predicates with string columns to the SCAN node.
What happens is: behind the scenes, with columnstore creation, a dictionary is created where only the distinct values are stored once. The query execution compares the values against the dictionary and only the qualified rows are returned to flow out the SCAN node, making the number of string comparisons significantly less. Then, the rows that are not eligible in the previous optimization (e.g. rows in the Deltastore1) are selected in the FILTER node.
Example: See the outcome of this query: SELECT * FROM table WHERE Name LIKE ‘%test%’
Analytics with Window Aggregates: Computes values of functions across the windows defined by the OVER clause. Provides a noticeable performance improvement on the query processing time. This is mostly used on Data Warehouses.
A few other key points I took throughout the session:
- Improved Compression: Further compression, even if your database has been previously compressed.
- Data Load: When data is loaded it is recommended to do it in a group of more than 100.000 rows (creates a compressed Row Group). What doesn’t fit in a row group goes to the Delta Row Group1.
- Data Load: Parallel insert is available, multiple Row Groups created at the same time.
- Reduced I/O due to compression.
- Index REORGANIZE: allows to merge one or more compressed Row Groups with deleted rows.
Clustered Columnstore Index is new to me and I thought the whole concept was very interesting and exciting. I hope you find it as fascinating as I did. For more information on this topic, see the links below:
- Columnstore Index – List of blogs on columnstore index published by SQL Server Product Team
- SQL Server 2016 ColumnStore Technology
Thanks for reading and talk to you again soon!
1 Deltastore or Delta Row Group: Used with clustered column store indexes only, a deltastore is a clustered index that improves columnstore compression and performance by storing rows until the number of rows reaches a threshold and are then moved into the columnstore.
2 Images are taken from video Clustered Columnstore Index Deep Dive: https://channel9.msdn.com/Events/Ignite/Australia-2017/DA325