HiveBrain v1.2.0
Get Started
← Back to all entries
principleMinor

Rank vs. Distinct

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
distinctrankstackoverflow

Problem

Just for curiosity, if I have to use rank to tag distinct values for some columns only, and ignore whatever different values occurred on other columns, why not use distinct and just remove the other columns? Any real life examples?

SELECT * FROM (
SELECT
 col1,
 col2,
 col3,
 col4,
 row_number () over (partition by col1, col2 ORDER BY col3) RN
FROM table) tmp_table
WHERE rn = 1


vs.

SELECT DISTINCT
 col1,
 col2
FROM table


When ordered by col3, there is a chance you can get different values of col3 compared to rank 2, right? Then if that col3 is not important because it is not consistent, why not use distinct instead for only col1 and col2?

Solution

A few things:

  • You are using ROW_NUMBER(), not RANK()



  • For the example you cite, it does not really make much difference



  • For other situations, ROW_NUMBER() provides additional flexibility. For example, suppose you are doing a data cleanse, and you want to read you input table and write all the duplicated rows into another table, so that you can process them further. Then you can do something like:



insert /*+ APPEND */ into table2
select * from (
  select col1
        , col2
        , col3
        , col4
   row_number() over (partition by  order by  ) rn
   from table )
where rn > 1

Code Snippets

insert /*+ APPEND */ into table2
select * from (
  select col1
        , col2
        , col3
        , col4
   row_number() over (partition by <key columns> order by <xxx> ) rn
   from table )
where rn > 1

Context

StackExchange Database Administrators Q#164364, answer score: 2

Revisions (0)

No revisions yet.