snippetsqlMajor
How to preserve the original order of elements in an unnested array?
Viewed 0 times
ordertheelementspreserveoriginalarrayunnestedhow
Problem
Given the string:
'I think that PostgreSQL is nifty'
I would like to operate on the individual words found within that string. Essentially, I have a separate from which I can get word details and would like to join an unnested array of that string on this dictionary.
So far I have:
This accomplishes the fundamentals of what I was hoping to do, but it does not preserve the original word order.
Related question:
PostgreSQL unnest() with element number
'I think that PostgreSQL is nifty'
I would like to operate on the individual words found within that string. Essentially, I have a separate from which I can get word details and would like to join an unnested array of that string on this dictionary.
So far I have:
select word, meaning, partofspeech
from unnest(string_to_array('I think that PostgreSQL is nifty',' ')) as word
from table t
join dictionary d
on t.word = d.wordname;This accomplishes the fundamentals of what I was hoping to do, but it does not preserve the original word order.
Related question:
PostgreSQL unnest() with element number
Solution
WITH ORDINALITY in Postgres 9.4 or laterThe query can now simply be:
SELECT *
FROM regexp_split_to_table('I think Postgres is nifty', ' ') WITH ORDINALITY x(word, rn);Or, applied to a table:
SELECT *
FROM tbl t, regexp_split_to_table(t.col, ' ') WITH ORDINALITY x(word, rn);Details:
- PostgreSQL unnest() with element number
About the implicit
LATERAL join:- What is the difference between LATERAL and a subquery in PostgreSQL?
db<>fiddle here
Old sqlfiddle
Postgres 9.3 or older - and more general explanation
For a single string
You can apply the window function
row_number() to remember the order of elements. However, with the usual row_number() OVER (ORDER BY col) you get numbers according to the sort order, not the original position in the string.You could simply omit
ORDER BY to get the position "as is":SELECT *, row_number() OVER () AS rn
FROM regexp_split_to_table('I think Postgres is nifty', ' ') AS x(word);Performance of
regexp_split_to_table() degrades with long strings. unnest(string_to_array(...)) scales better:SELECT *, row_number() OVER () AS rn
FROM unnest(string_to_array('I think Postgres is nifty', ' ')) AS x(word);However, while this normally works and I have never seen it break in simple queries, Postgres asserts nothing as to the order of rows without an explicit
ORDER BY.To guarantee ordinal numbers of elements in the original string, use
generate_subscript() (improved with comment by @deszo):SELECT arr[rn] AS word, rn
FROM (
SELECT *, generate_subscripts(arr, 1) AS rn
FROM string_to_array('I think Postgres is nifty', ' ') AS x(arr)
) y;For a table of strings
Add
PARTITION BY id to the OVER clause ...Demo table:
CREATE TEMP TABLE strings(string text);
INSERT INTO strings VALUES
('I think Postgres is nifty')
, ('And it keeps getting better');I use
ctid as ad-hoc substitute for a primary key. If you have one (or any unique column) use that instead.SELECT *, row_number() OVER (PARTITION BY ctid) AS rn
FROM (
SELECT ctid, unnest(string_to_array(string, ' ')) AS word
FROM strings
) x;This works without any distinct ID:
SELECT arr[rn] AS word, rn
FROM (
SELECT *, generate_subscripts(arr, 1) AS rn
FROM (
SELECT string_to_array(string, ' ') AS arr
FROM strings
) x
) y;Answer to question
SELECT z.arr, z.rn, z.word, d.meaning -- , partofspeech -- ?
FROM (
SELECT *, arr[rn] AS word
FROM (
SELECT *, generate_subscripts(arr, 1) AS rn
FROM (
SELECT string_to_array(string, ' ') AS arr
FROM strings
) x
) y
) z
JOIN dictionary d ON d.wordname = z.word
ORDER BY z.arr, z.rn;Code Snippets
SELECT *
FROM regexp_split_to_table('I think Postgres is nifty', ' ') WITH ORDINALITY x(word, rn);SELECT *
FROM tbl t, regexp_split_to_table(t.col, ' ') WITH ORDINALITY x(word, rn);SELECT *, row_number() OVER () AS rn
FROM regexp_split_to_table('I think Postgres is nifty', ' ') AS x(word);SELECT *, row_number() OVER () AS rn
FROM unnest(string_to_array('I think Postgres is nifty', ' ')) AS x(word);SELECT arr[rn] AS word, rn
FROM (
SELECT *, generate_subscripts(arr, 1) AS rn
FROM string_to_array('I think Postgres is nifty', ' ') AS x(arr)
) y;Context
StackExchange Database Administrators Q#27279, answer score: 31
Revisions (0)
No revisions yet.