This post applies to SQL Server 2008 R2 and above.
Large (100 million+ rows) temporal tables are increasingly common. Examples include tables holding order information in e-commerce applications, analytics data stores, audit history tables. Performance of temporal queries can be order-of-magnitude(s) improved via indexes:
- Clustered, if say datetimes used to represent timestamp are comparable in order of magnitude to number of rows. For example, 100 million unique datetimes in a 500 million row table.
- Non-clustered if unique datetimes << number of rows.
A scenario, which does not lend favor to indexes, involves:
- Datetimes, comparable in order of magnitude to number of rows, as the only query predicates AND
- The single clustered index containing datetime has other, more significant (sort order-wise) columns.
Likelihood of this scenario is non-trivial. For example, an order table with a clustered index on orderID + orderDateTime (in that order) with queries against orderID only, orderDateTimeOnly and/or both (I contend that the ‘both’ possibility is relatively rare.)
So how do we facilitate high-performance temporal queries in this scenario?
The answer, partitioning
Crux of partitioning is to physically divide the table on a partitioning column in to smaller sets having independent filesystem storage. Classic divide and conquer, which better exploits SQL Server threading and multiple cores (common for production servers) to improve query performance. Step-by-step instructions for partitioning tables and indexes are available on MSDN. The concept is succinctly explained on MSDN here. A detailed treatment in context of SQL Server 2008 (and largely applicable for later versions) is provided by a Microsoft white paper (nice read).
Note that number of partitions are greatly increased in SQL Server 2012 and onwards; an important feature for data warehouses.
In the next and final part of this article, I will provide experimental support to substantiate performance gain provided by partitioning. Spoiler alert – performance is improved for read and insert queries.
Thanks for reading.