patternsqlMinor
SQL order by query results in any arbitrary (but reproducible) manner
Viewed 0 times
ordersqlarbitraryqueryanybutreproduciblemannerresults
Problem
For testing purposes I need to get some data from a set of database tables in any arbitrary (but reproducible) order. The idea being that I can later compare two runs using a textual diff tool. Is there an idiom for that?
E.g. I can obviously do a:
I am just asking if there is an idiomatic way to achieve the same effect (for my purposes) without bothering to list every column in the ORDER BY clause. Any ordering will do so long as it is reproducible with subsequent runs of the query.
E.g. I can obviously do a:
SELECT * FROM table_with_N_columns ORDER BY column_1, ... , column_NI am just asking if there is an idiomatic way to achieve the same effect (for my purposes) without bothering to list every column in the ORDER BY clause. Any ordering will do so long as it is reproducible with subsequent runs of the query.
Solution
Use PK columns
Sorting by all columns of a table is potentially (very) expensive or sometimes simply impossible. Example: table with a
If it doesn't fail, it should work for your purpose, but it still would not guarantee a true stable sort order. Without unique or primary key, completely duplicate rows are allowed, which are still sorted arbitrarily. Two concurrent sessions could both change the "first" in a set of dupes without conflict, resulting in two different rows being changed separately.
Use the primary key column(s) of each table. Unique constraint would work, too, but you really should have a pk on every table to begin with. Besides being correct and guaranteed to work, it's also typically much faster.
Function
Query the system catalogs
I wrapped the query in a handy SQL function:
-
This is very fast.
-
The function takes the table name as parameter, type
-
Any
-
The constraint can consist of multiple columns, attribute numbers of which are stored in the array
-
Result columns are table-qualified to get syntax ready for queries on multiple tables. This optional feature may have to be modified if you are using table aliases in queries.
-
Also quote column names correctly to avoid SQL injection.
Default
If no constraint is found, the function returns
This is not going to sort rows like original values would, but you get a consistent sort order that works for every table. Since you are going to compare text representations anyway.
Better still, add
Still a potentially very inefficient measure of last resort. Use PKs if at all possible. I put the default into the function above. Everything put together, for a query:
This returns the complete string for
SQL Fiddle demonstrating all.
Sorting by all columns of a table is potentially (very) expensive or sometimes simply impossible. Example: table with a
json column or xml, or point or one of many other data types which cannot be used in ORDER BY, lacking the necessary operators. You get an exception.If it doesn't fail, it should work for your purpose, but it still would not guarantee a true stable sort order. Without unique or primary key, completely duplicate rows are allowed, which are still sorted arbitrarily. Two concurrent sessions could both change the "first" in a set of dupes without conflict, resulting in two different rows being changed separately.
Use the primary key column(s) of each table. Unique constraint would work, too, but you really should have a pk on every table to begin with. Besides being correct and guaranteed to work, it's also typically much faster.
Function
Query the system catalogs
pg_constraint and pg_attribute to get columns for a PRIMARY KEY or UNIQUE constraint.I wrapped the query in a handy SQL function:
CREATE OR REPLACE FUNCTION f_order_cols(_tbl regclass)
RETURNS text AS
$func$
SELECT string_agg(_tbl::text || '.' || quote_ident(attname), ',')
FROM (
SELECT conrelid AS attrelid, conkey
FROM pg_catalog.pg_constraint
WHERE conrelid = $1
AND contype = ANY ('{p,u}'::"char"[]) -- primary or unique constraint
ORDER BY contype -- pk has priority ('p' < 'u')
, array_length(conkey, 1) -- else, fewest columns as quick tie breaker
LIMIT 1 -- ... but any 1 of them is good
) c
JOIN unnest(c.conkey) attnum ON TRUE -- unnest array, possibly mult. cols
JOIN pg_catalog.pg_attribute a USING (attrelid, attnum)
UNION ALL -- Default if first query returns no row
SELECT _tbl::text || '::text COLLATE "C"' -- See chapter "Default" below
LIMIT 1
$func$ LANGUAGE sql;-
This is very fast.
-
The function takes the table name as parameter, type
regclass to be precise. Resolves the name according to the current search_path and raises an exception if invalid. At the same time provides a properly quoted identifier in the result automatically.-
Any
PRIMARY KEY or UNIQUE constraint is good. PK has priority, else the unique constraint with the fewest columns is picked.-
The constraint can consist of multiple columns, attribute numbers of which are stored in the array
pg_constraint.conkey. Using an implicit JOIN LATERAL to unnest() that.-
Result columns are table-qualified to get syntax ready for queries on multiple tables. This optional feature may have to be modified if you are using table aliases in queries.
-
Also quote column names correctly to avoid SQL injection.
quote_ident() does that for you. More about that:- SQL injection in Postgres functions vs prepared queries
Default
If no constraint is found, the function returns
NULL. Wrap the result into COALESCE and use the text representation of the whole row as default: tbl::text.This is not going to sort rows like original values would, but you get a consistent sort order that works for every table. Since you are going to compare text representations anyway.
Better still, add
COLLATE "C" to ignore collation rules. Faster, and just as good for your purpose.ORDER BY tbl::text COLLATE "C"Still a potentially very inefficient measure of last resort. Use PKs if at all possible. I put the default into the function above. Everything put together, for a query:
SELECT * FROM u JOIN p USING (a) JOIN n ON n.b = p.b ORDER BY ?This returns the complete string for
?:SELECT concat_ws(', ', f_order_cols('u')
, f_order_cols('p')
, f_order_cols('n')) AS order_by;SQL Fiddle demonstrating all.
Code Snippets
CREATE OR REPLACE FUNCTION f_order_cols(_tbl regclass)
RETURNS text AS
$func$
SELECT string_agg(_tbl::text || '.' || quote_ident(attname), ',')
FROM (
SELECT conrelid AS attrelid, conkey
FROM pg_catalog.pg_constraint
WHERE conrelid = $1
AND contype = ANY ('{p,u}'::"char"[]) -- primary or unique constraint
ORDER BY contype -- pk has priority ('p' < 'u')
, array_length(conkey, 1) -- else, fewest columns as quick tie breaker
LIMIT 1 -- ... but any 1 of them is good
) c
JOIN unnest(c.conkey) attnum ON TRUE -- unnest array, possibly mult. cols
JOIN pg_catalog.pg_attribute a USING (attrelid, attnum)
UNION ALL -- Default if first query returns no row
SELECT _tbl::text || '::text COLLATE "C"' -- See chapter "Default" below
LIMIT 1
$func$ LANGUAGE sql;ORDER BY tbl::text COLLATE "C"SELECT * FROM u JOIN p USING (a) JOIN n ON n.b = p.b ORDER BY ?SELECT concat_ws(', ', f_order_cols('u')
, f_order_cols('p')
, f_order_cols('n')) AS order_by;Context
StackExchange Database Administrators Q#75876, answer score: 7
Revisions (0)
No revisions yet.