HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SQL query with dynamic unpivot+pivot for cross product

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
sqlwithquerypivotunpivotcrossproductfordynamic

Problem

In this Data Explorer query I am trying to do the following:

-
For each tag:

  • Compute sum of answer scores in this tag ("S")



-
Compute count of answers in this tag ("A")

-
For each tag class (Bronze, Silver, Gold)

add two columns:

  • S divided by this class's score goal



  • A divided by this class's answer goal



I wanted to do this in the most general way possible, allowing more tag classes/goals to be added later. I came up with this:

-- Predefined tag badge goals
... TagBadges as (
  select * from
    (values
      (1, 'Bronze', 100, 20),
      (2, 'Silver', 400, 80),
      (3, 'Gold', 1000, 200))
    as Badge(Idx, Class, Score, Answers)
),

-- Progress per tag, per badge class
TypeProgress as (
  select
    RawData.TagName,
    format(iif(RawData.Score > TagBadges.Score, 1,
               cast(RawData.Score as float)/TagBadges.Score), '#0.#%') as Score,
    format(iif(RawData.Answers > TagBadges.Answers, 1,
               cast(RawData.Answers as float)/TagBadges.Answers), '#0.#%') as Answers,
    TagBadges.Class
  from RawData cross join TagBadges
),

-- Combine class & type columns
AllProgress as (
  select TagName, Progress, Class+' '+Type as Category
  from TypeProgress
  unpivot (Progress for Type in (Score, Answers)) p
) ...


But in the end I still had to list all the cases (2 × 3 = 6) explicitly:

select *
from AllProgress
pivot (
  max(Progress) for Category in
    ([Bronze Score], [Bronze Answers],
     [Silver Score], [Silver Answers],
     [Gold Score], [Gold Answers])
) q


Is there a better way of doing this?

Solution

There is a more general way to do it, but it involves a dynamic query..... which is messy, and may not be better than what you have. Any time you involve a pivot, you are going in to the 'ugly' territory.

First though, some other comments:

  • you do not include the Rawdata CTE with your question, and it has some small issues I can see in SEDE



  • You select your data out as formatted strings, and you should only do the formatting as the last step, if possible.



Other than that, I found your query relatively easy to read, and since I understand the SEDE dataset quite well, I could follow the logic nicely.

Basically, your query is good, and I would 'accept it' in a review.

Ugly things of note though are:

  • max(Progress) is doing character-based max on a single-value set



  • the order by does a convert-to-float of a character value



I would like those things to be commented on, but, since the data volumes for those operations are small, I think it is OK, given the circumstances.

Out of interest, I processed your query using a more general syntax, that avoids the unpivot, and makes the pivot dynamic. As you will see, it is just as ugly, if not worse.

  • it does not do the order-by



  • it does not format the values as string at all.



See it for yourself

Note the multiple temp tables, and the variables for the pivot, and the query. Ugly, but it may give you some ideas.

Context

StackExchange Code Review Q#59538, answer score: 4

Revisions (0)

No revisions yet.