patternsqlModerate

Star join query optimization - Change partitioning, use columnstore?

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

starquerycolumnstorejoinoptimizationusechangepartitioning

Problem

I am trying to know the best possible way to improve performance of a query given to me by my client. It contains a few tables which are joined and one of table is called dwh.fac_sale_detail which contains 1.5 billion rows.

This table dwh.fac_sale_detail is partitioned based on one of its columns called TradingDateKey1. It actually stores data in yyyymmdd format but it is of INT Datatype.

This has TradingDateKeys from 2005 to 2015, but the partitions are created until year 2014 only.

One of the guys in another team advised the following and I am trying to follow his advice but I am new to creating or altering of partitions and don't know if this will actually make any difference to query performance:

What he said in his own words is "The FactSalesDetail table is currently about 1.5 Billion Rows and is currently partitioned on TradingDate into 10 partitions by year with about 150 Million Rows per partition. It would be a good idea to further partition the latest year into monthly partitions and apply a columnstore index on all the partitions. Applying the index on each partition will be a once-off and you should only need to maintain the index for the current partition going forward."

Here is the query plan for the query I am trying to optimize.

Please also see the attached screenshots for a better understanding:

Solution

Thanks for adding the query plan; it is very informative. I have a number of recommendations based on the query plan, but first a caveat: don't just take what I say and assume it's correct, try it out (ideally in your testing environment) first and make sure you understand why the changes do or don't improve your query!

The query plan: an overview

From this query plan (as well as the corresponding XML), we can immediately see a few useful pieces of information:

You are on SQL 2012

This is a classic star join query and you are getting the benefit of the in-row bitmap filter optimization that was added in SQL 2008 for such plans

The fact table contains about 1.5 billion rows, and just over 500 million of those rows match the dimension filters

The query requests 72GB of memory, but is only granted 12GB of memory (presumably, 12GB is the max that will be granted to any given query, meaning your machine likely has ~64GB of memory)

SQL Server is performing a sort-stream aggregate that takes 500 million rows down to just 600,000 rows. The sort is exceeding it's memory grant and spilling to tempdb

We have warnings for plan-affecting converts due to either explicit and implicit conversions in your query

The query uses 32 threads, but the initial seek into your fact table has an enormous thread skew; just 2 of the 32 threads do all of the work. (At subsequent steps in the query plan, however, the work is more balanced.)

Optimization: columnstore or not

This is a tough question, but on balance I would not recommend columnstore for you in this case. The primary reason is that you are on SQL 2012, so if you are able to upgrade to SQL 2014 I think it might be worth trying out columnstore.

In general, your query is the type that columnstore was designed for and could benefit greatly from the reduced I/O of columnstore and the greater CPU efficiency of batch mode.

However, the limitations of columnstore in SQL 2012 are just too great, and the tempdb spill behavior, where any spill will cause SQL Server to abandon batch mode entirely, can be a devastating penalty that might come into play with the large volumes of rows you are working with. If you do go with columnstore on SQL 2012, be prepared to baby-sit all of your queries very closely and ensure that batch mode can always be used.

Optimization: more partitions?

I don't think that more partitions will help this particular query. You are welcome to try it, of course, but keep in mind that partitioning is primarily a data management feature (the ability to swap in new data in your ETL processes via SWITCH PARTITION and not a performance feature. It can obviously help performance in some cases, but similarly it can hurt performance in others (e.g., lots of singleton seeks that now have to be performed once per partition).

If you do go with columnstore, I think that loading your data for better segment elimination will be more important than partitioning; ideally you will probably want as many rows in each partition as possible in order to have full columnstore segments and great compression rates.

Optimization: improving cardinality estimates

Because you have a huge fact table and a handful of very small (hundreds or thousands of rows) set of rows from each dimension table, I would recommend an approach where you explicitly create a temporary table containing only the dimension rows that you plan to use. For example, rather than join to Dim_Date with a complicated logic like cast(right(ALHDWH.dwh.Dim_Date.Financial_Year,4) as int) IN ( 2015, 2014, 2013, 2012, 2011 ), you should write a pre-proccessing query to extract only the rows from Dim_Date that you care about and add the appropriate PK to those rows.

This will allow SQL Server to create statistics on just the rows you are actually using, which may yield better cardinality estimates throughout the plan. Because this pre-processing would be such a trivial amount of work compared to the overall query complexity, I would highly recommend this option.

Optimization: reducing thread skew

It's likely that extracting the data from Dim_Date into it's own table and adding a primary key to that table would also help to reduce thread skew (an imbalance of work across threads). Here's a picture that helps show why:

In this case, the Dim_Date table has 22,000 rows, SQL Server estimated that you are going to use 7,700 of those rows, and you actually only used 1,827 of those rows.

Because SQL Server uses statistics in order to allocate ranges of rows to threads, the poor cardinality estimates in this case are likely the root cause of the very poor distribution of rows.

Thread skew on 1,872 rows may not matter much, but the painful point is that this then cascades down to the seek into your 1.5 billion row fact table, where we have 30 threads sitting idle while 600 million rows are being processed by 2 threads.

Optimization: getting rid of the sort spill

Another area I w

Context

StackExchange Database Administrators Q#115220, answer score: 13

Revisions (0)

No revisions yet.