patternsqlMinor
PostgreSQL randomising combinations with LATERAL
Viewed 0 times
postgresqlcombinationswithlateralrandomising
Problem
In the following example I have a table
Let's say that I'd like to group by
What I'd like to do is get random picks from each bin several times. I had thought I'd be able to do this with
However, when I do this in PostgreSQL 9.5 I find I get the same
I'm confused, as I thought the subquery containing the
EDIT: I realised that I can achieve the same objective by generating more combinations and choosing from these with
So my question is simply why didn't the first way work?
EDIT: The problem appears to be that LATERAL only acts like a for-loop if the subqueries are correlated in some way (thanks to @ypercube's comment). Hence my original approach can be fixed by adding the following small change
foo from which I'd like to pick out at random a row per group.CREATE TABLE foo (
line INT
);
INSERT INTO foo (line)
SELECT generate_series(0, 999, 1);Let's say that I'd like to group by
line % 10. I could do this with:SELECT DISTINCT ON (bin) bin, line
FROM (
SELECT line, line % 10 AS bin, random() x
FROM foo
ORDER BY x
) XWhat I'd like to do is get random picks from each bin several times. I had thought I'd be able to do this with
generate_series() and LATERALSELECT i, line, bin
FROM
(
SELECT generate_series(1,3) i
) m,
LATERAL
(SELECT DISTINCT ON (bin) bin, line
FROM (
SELECT line, line % 10 bin, random() x
FROM foo
ORDER BY x
) X
ORDER BY bin) Q
ORDER BY bin, i;However, when I do this in PostgreSQL 9.5 I find I get the same
line for a given bin for each iteration i, e.g.,i;line;bin
1;530;0
2;530;0
3;530;0
1;611;1
2;611;1
3;611;1
...I'm confused, as I thought the subquery containing the
random() would be run differently for each line from the generate_series(). EDIT: I realised that I can achieve the same objective by generating more combinations and choosing from these with
SELECT DISTINCT ON (bin, round) round, bin, line
FROM (
SELECT line, line % 10 as bin, round
FROM foo, generate_series(1,3) round
ORDER BY bin, random()
) X;So my question is simply why didn't the first way work?
EDIT: The problem appears to be that LATERAL only acts like a for-loop if the subqueries are correlated in some way (thanks to @ypercube's comment). Hence my original approach can be fixed by adding the following small change
SELECT i, line, bin
FROM
(
SELECT generate_series(1,3) i
) m,
LATERAL
(
SELECT DISTINCT ON (bin) bin, line
FROM (
SELECT line, line % 10 bin, m.i, random() x -- <NOTE m.i HERE
FROM foo
ORDER BY x
) X
ORDER BY bin
LIMIT 3
) Q
ORDER BY bin, i;Solution
I'd write the query like this, using
The
Also, if you don't need the
LIMIT (3) instead of DISTINCT ON.The
generate_series(0, 9) is used to get all the distinct bins. You could use (SELECT DISTINCT line % 10 FROM foo) AS g (bin) instead, if the "bins" are not all the integers from 0 up to 9:SELECT
g.bin,
ROW_NUMBER() OVER (PARTITION BY g.bin ORDER BY d.x) AS i,
d.*
FROM
generate_series(0, 9) AS g (bin),
LATERAL
( SELECT f.*, random() x
FROM foo AS f
WHERE f.line % 10 = g.bin
ORDER BY x
LIMIT 3
) AS d
ORDER BY
bin, x ;Also, if you don't need the
random() number in the output, you could use ORDER BY random() in the subquery and remove x from the select and order by clauses - or replace ORDER BY d.x with ORDER BY d.line.Code Snippets
SELECT
g.bin,
ROW_NUMBER() OVER (PARTITION BY g.bin ORDER BY d.x) AS i,
d.*
FROM
generate_series(0, 9) AS g (bin),
LATERAL
( SELECT f.*, random() x
FROM foo AS f
WHERE f.line % 10 = g.bin
ORDER BY x
LIMIT 3
) AS d
ORDER BY
bin, x ;Context
StackExchange Database Administrators Q#173566, answer score: 5
Revisions (0)
No revisions yet.