patternsqlMinor

Comparing DISTINCT, GROUP BY and ROW_NUMBER() in SQL Server 2008 with data warehouse workloads

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

distinctcomparinggroupserversql2008withwarehouseworkloadsrow_number

Problem

I recently heard the advice: "If you're using DISTINCT, I challenge you to fix your code." However, I'm wondering what considerations are behind this advice, and whether it means that I should approach the following problem differently.

Problem context

I have a denormalised date dimension table (Kimball star schema). In this table, I have collapsed several 3rd normal form tables: date, absolute week, absolute month and year. For each of these collapsed tables, I have a column that represents a unique identity, a textual name column and a numeric manual sort ordering index. Here's an example row to show the structure, the numbers are illustrative only (in vertical form for ease of reading):

DateId: 20110507

DateName: 7th May 2011

DateOrder: 17813

WeekId: 201118

WeekName: Week 18, 2011

WeekOrder: 617

MonthId: 201105

MonthName: May 2011

MonthOrder: 74

YearId: 2011

YearName: 2011

YearOrder: 41

I've chosen to use a date dimension table for ease of understanding, but the problem can be transposed to any other dimension table involving hierarchies.

Problem synopsis

I want to retrieve the distinct set of week and month combinations with the related textual fields.

Possible solutions

DISTINCT

SELECT DISTINCT MonthId, WeekId, YearName, MonthName, WeekName
  FROM DimDate

Note that I cannot exclude MonthId or WeekId because I cannot assume that free text fields are unique.

GROUP BY

SELECT MIN(YearName), MIN(MonthName), MIN(WeekName)
  FROM DimDate
 GROUP BY MonthId, WeekId

ROWNUMBER

WITH grp AS (
  SELECT YearName, MonthName, WeekName
       , ROW_NUMBER() OVER (PARTITION BY MonthId, WeekId) AS r
    FROM DimDate
)
SELECT YearName, MonthName, WeekName
  FROM grp
 WHERE grp.r = 1

In this approach, indexed views of every combination of non-leaf attributes are precalculated and queried directly. Obviously this is fastest at query time, but could require many duplications of the dimensio

Solution

In my experience, an aggregate (DISTINCT or GROUP BY) can be quicker then a ROW_NUMBER() approach. Saying that, ROW_NUMBER is better with SQL Server 2008 than SQL Server 2005.

However, you'll have to try for your situation.

Compare query plans, and use Profiler and SET to capture IO, CPU, Duration etc

For a lot of background, see these SO questions:

Why are logical reads for windowed aggregate functions so high? (follow the links from Martin Smith)

can I get count() and rows from one sql query in sql server? (note the comments in the answers from me and Chris Bednarski)

Finally, do you need the ROW_NUMBER approach? It looks like you're fixing a problem caused by de-normalisation.

And some notes:

shouldn't YearID be in the GROUP BY or PARTITION BY?

Won't DISTINCT give different output?

Are these columns indexed?

Context

StackExchange Database Administrators Q#7118, answer score: 3

Revisions (0)

No revisions yet.