patternsqlMinor
A more intelligent ntile
Viewed 0 times
intelligentmorentile
Problem
When using the
For example with the following query:
I will get 10 groups of about the same size, with the strong likelihood that paintings with the same price will end up in different bins.
For Example:
`┌────┬────────────────────────────────────────────┬───────┬────────────┬──────┬───────┬────────────┬────────┐
│ id │ title │ price │ row_number │ rank │ count │ dense_rank │ decile │
╞════╪════════════════════════════════════════════╪═══════╪════════════╪══════╪═══════╪════════════╪════════╡
│ 11 │ Eyes in the Heat │ 10 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 19 │ Deux fillettes, fond jaune et rouge │ 11 │ 2 │ 2 │ 2 │ 2 │ 1 │
│ 17 │ Flowers in a Pitcher │ 12 │ 3 │ 3 │ 6 │ 3 │ 1 │
│ 5 │ Composition with Red, Yellow and Blue │ 12 │ 4 │ 3 │ 6 │ 3 │ 2 │
│ 18 │ La lecon de musique (The Music Lesson) │ 12 │ 5 │ 3 │ 6 │ 3 │ 2 │
│ 9 │ The Adoration of the Magi │ 12 │ 6 │ 3 │ 6 │ 3 │ 2 │
│ 29 │ Self-Portrait │ 14 │ 7 │ 7 │ 10 │ 4 │ 3 │
│ 25 │ Symphony in White, No. 1: The White Girl │ 14 │ 8 │ 7 │ 10 │ 4 │ 3 │
│ 30 │ The Anatomy Lecture of Dr. Nicolaes Tulp │ 14 │ 9 │ 7 │ 10 │ 4 │ 3 │
│ 20 │ Les repasseuses (Women Ironing)
ntile() window function, the major issue is that it arbitrarily groups into roughly equal parts regardless of the actual value.For example with the following query:
select
id,title,price,
row_number() over(order by price) as row_number,
rank() over(order by price) as rank,
count(*) over(order by price) as count,
dense_rank() over(order by price) as dense_rank,
ntile(10) over(order by price) as decile
from paintings
order by price;I will get 10 groups of about the same size, with the strong likelihood that paintings with the same price will end up in different bins.
For Example:
`┌────┬────────────────────────────────────────────┬───────┬────────────┬──────┬───────┬────────────┬────────┐
│ id │ title │ price │ row_number │ rank │ count │ dense_rank │ decile │
╞════╪════════════════════════════════════════════╪═══════╪════════════╪══════╪═══════╪════════════╪════════╡
│ 11 │ Eyes in the Heat │ 10 │ 1 │ 1 │ 1 │ 1 │ 1 │
│ 19 │ Deux fillettes, fond jaune et rouge │ 11 │ 2 │ 2 │ 2 │ 2 │ 1 │
│ 17 │ Flowers in a Pitcher │ 12 │ 3 │ 3 │ 6 │ 3 │ 1 │
│ 5 │ Composition with Red, Yellow and Blue │ 12 │ 4 │ 3 │ 6 │ 3 │ 2 │
│ 18 │ La lecon de musique (The Music Lesson) │ 12 │ 5 │ 3 │ 6 │ 3 │ 2 │
│ 9 │ The Adoration of the Magi │ 12 │ 6 │ 3 │ 6 │ 3 │ 2 │
│ 29 │ Self-Portrait │ 14 │ 7 │ 7 │ 10 │ 4 │ 3 │
│ 25 │ Symphony in White, No. 1: The White Girl │ 14 │ 8 │ 7 │ 10 │ 4 │ 3 │
│ 30 │ The Anatomy Lecture of Dr. Nicolaes Tulp │ 14 │ 9 │ 7 │ 10 │ 4 │ 3 │
│ 20 │ Les repasseuses (Women Ironing)
Solution
You could use
Result:
and I’m not fussed about which decile
Note that bin 2 and 8 with your sample data ended up empty.
rank() and do integer division with number of rows for each bin.declare @T table(id int, title varchar(100), price int);
insert into @T(id, title, price) values
(19, 'Deux fillettes, fond jaune et rouge ', 11),
(17, 'Flowers in a Pitcher ', 12),
(5 , 'Composition with Red, Yellow and Blue ', 12),
(18, 'La lecon de musique (The Music Lesson) ', 12),
(9 , 'The Adoration of the Magi ', 12),
(29, 'Self-Portrait ', 14),
(25, 'Symphony in White, No. 1: The White Girl ', 14),
(30, 'The Anatomy Lecture of Dr. Nicolaes Tulp ', 14),
(20, 'Les repasseuses (Women Ironing) ', 14),
(1 , 'The Birth of Venus ', 15),
(12, 'Femme se promenant dans une foret exotique ', 15),
(24, 'Portrait of the Painter’s Mother ', 15),
(28, 'Jeunes filles au piano ', 15),
(7 , 'Portrait de l artiste (Self-portrait) ', 16),
(3 , 'The Last Supper ', 16),
(13, 'Combat of a Tiger and a Buffalo ', 16),
(4 , 'The Creation of Man ', 17),
(22, 'Le Chemin de Fer ', 17),
(6 , 'Femmes de Tahiti [Sur la plage] ', 18),
(21, 'Le Bar aux Folies-Berg ', 18),
(26, 'Lady at the Piano ', 18),
(15, 'Remembrance of a Garden ', 18),
(16, '1914 ', 18),
(14, 'Ancient Sound, Abstract on Black ', 19),
(8 , 'The Large Turf ', 19),
(23, 'On the Beach ', 19),
(2 , 'Portrait of Mona Lisa ', 19),
(27, 'On the Terrace ', 20),
(10, 'The She-Wolf ', 20);
declare @BinCount int = 10;
declare @BinSize int;
select @BinSize = 1 + count(*) / @BinCount from @T;
select T.id,
T.title,
T.price,
1 + rank() over(order by T.price) / @BinSize as decile
from @T as T;Result:
id title price decile
--- ------------------------------------------- ------ --------------------
19 Deux fillettes, fond jaune et rouge 11 1
17 Flowers in a Pitcher 12 1
5 Composition with Red, Yellow and Blue 12 1
18 La lecon de musique (The Music Lesson) 12 1
9 The Adoration of the Magi 12 1
29 Self-Portrait 14 3
25 Symphony in White, No. 1: The White Girl 14 3
30 The Anatomy Lecture of Dr. Nicolaes Tulp 14 3
20 Les repasseuses (Women Ironing) 14 3
1 The Birth of Venus 15 4
12 Femme se promenant dans une foret exotique 15 4
24 Portrait of the Painter’s Mother 15 4
28 Jeunes filles au piano 15 4
7 Portrait de l artiste (Self-portrait) 16 5
3 The Last Supper 16 5
13 Combat of a Tiger and a Buffalo 16 5
4 The Creation of Man 17 6
22 Le Chemin de Fer 17 6
6 Femmes de Tahiti [Sur la plage] 18 7
21 Le Bar aux Folies-Berg 18 7
26 Lady at the Piano 18 7
15 Remembrance of a Garden 18 7
16 1914 18 7
14 Ancient Sound, Abstract on Black 19 9
8 The Large Turf 19 9
23 On the Beach 19 9
2 Portrait of Mona Lisa 19 9
27 On the Terrace 20 10
10 The She-Wolf 20 10and I’m not fussed about which decile
Note that bin 2 and 8 with your sample data ended up empty.
Code Snippets
declare @T table(id int, title varchar(100), price int);
insert into @T(id, title, price) values
(19, 'Deux fillettes, fond jaune et rouge ', 11),
(17, 'Flowers in a Pitcher ', 12),
(5 , 'Composition with Red, Yellow and Blue ', 12),
(18, 'La lecon de musique (The Music Lesson) ', 12),
(9 , 'The Adoration of the Magi ', 12),
(29, 'Self-Portrait ', 14),
(25, 'Symphony in White, No. 1: The White Girl ', 14),
(30, 'The Anatomy Lecture of Dr. Nicolaes Tulp ', 14),
(20, 'Les repasseuses (Women Ironing) ', 14),
(1 , 'The Birth of Venus ', 15),
(12, 'Femme se promenant dans une foret exotique ', 15),
(24, 'Portrait of the Painter’s Mother ', 15),
(28, 'Jeunes filles au piano ', 15),
(7 , 'Portrait de l artiste (Self-portrait) ', 16),
(3 , 'The Last Supper ', 16),
(13, 'Combat of a Tiger and a Buffalo ', 16),
(4 , 'The Creation of Man ', 17),
(22, 'Le Chemin de Fer ', 17),
(6 , 'Femmes de Tahiti [Sur la plage] ', 18),
(21, 'Le Bar aux Folies-Berg ', 18),
(26, 'Lady at the Piano ', 18),
(15, 'Remembrance of a Garden ', 18),
(16, '1914 ', 18),
(14, 'Ancient Sound, Abstract on Black ', 19),
(8 , 'The Large Turf ', 19),
(23, 'On the Beach ', 19),
(2 , 'Portrait of Mona Lisa ', 19),
(27, 'On the Terrace ', 20),
(10, 'The She-Wolf ', 20);
declare @BinCount int = 10;
declare @BinSize int;
select @BinSize = 1 + count(*) / @BinCount from @T;
select T.id,
T.title,
T.price,
1 + rank() over(order by T.price) / @BinSize as decile
from @T as T;id title price decile
--- ------------------------------------------- ------ --------------------
19 Deux fillettes, fond jaune et rouge 11 1
17 Flowers in a Pitcher 12 1
5 Composition with Red, Yellow and Blue 12 1
18 La lecon de musique (The Music Lesson) 12 1
9 The Adoration of the Magi 12 1
29 Self-Portrait 14 3
25 Symphony in White, No. 1: The White Girl 14 3
30 The Anatomy Lecture of Dr. Nicolaes Tulp 14 3
20 Les repasseuses (Women Ironing) 14 3
1 The Birth of Venus 15 4
12 Femme se promenant dans une foret exotique 15 4
24 Portrait of the Painter’s Mother 15 4
28 Jeunes filles au piano 15 4
7 Portrait de l artiste (Self-portrait) 16 5
3 The Last Supper 16 5
13 Combat of a Tiger and a Buffalo 16 5
4 The Creation of Man 17 6
22 Le Chemin de Fer 17 6
6 Femmes de Tahiti [Sur la plage] 18 7
21 Le Bar aux Folies-Berg 18 7
26 Lady at the Piano 18 7
15 Remembrance of a Garden 18 7
16 1914 18 7
14 Ancient Sound, Abstract on Black 19 9
8 The Large Turf 19 9
23 On the Beach 19 9
2 Portrait of Mona Lisa 19 9
27 On the Terrace 20 10
10 The She-Wolf 20 10Context
StackExchange Database Administrators Q#245962, answer score: 5
Revisions (0)
No revisions yet.