HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SQL order by query results in any arbitrary (but reproducible) manner

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
ordersqlarbitraryqueryanybutreproduciblemannerresults

Problem

For testing purposes I need to get some data from a set of database tables in any arbitrary (but reproducible) order. The idea being that I can later compare two runs using a textual diff tool. Is there an idiom for that?

E.g. I can obviously do a:

SELECT * FROM table_with_N_columns ORDER BY column_1, ... , column_N


I am just asking if there is an idiomatic way to achieve the same effect (for my purposes) without bothering to list every column in the ORDER BY clause. Any ordering will do so long as it is reproducible with subsequent runs of the query.

Solution

Use PK columns

Sorting by all columns of a table is potentially (very) expensive or sometimes simply impossible. Example: table with a json column or xml, or point or one of many other data types which cannot be used in ORDER BY, lacking the necessary operators. You get an exception.

If it doesn't fail, it should work for your purpose, but it still would not guarantee a true stable sort order. Without unique or primary key, completely duplicate rows are allowed, which are still sorted arbitrarily. Two concurrent sessions could both change the "first" in a set of dupes without conflict, resulting in two different rows being changed separately.

Use the primary key column(s) of each table. Unique constraint would work, too, but you really should have a pk on every table to begin with. Besides being correct and guaranteed to work, it's also typically much faster.

Function

Query the system catalogs pg_constraint and pg_attribute to get columns for a PRIMARY KEY or UNIQUE constraint.

I wrapped the query in a handy SQL function:

CREATE OR REPLACE FUNCTION f_order_cols(_tbl regclass)
  RETURNS text AS
$func$
SELECT string_agg(_tbl::text || '.' || quote_ident(attname), ',')
FROM (
   SELECT conrelid AS attrelid, conkey
   FROM   pg_catalog.pg_constraint
   WHERE  conrelid = $1
   AND    contype = ANY ('{p,u}'::"char"[]) -- primary or unique constraint
   ORDER  BY contype                        -- pk has priority ('p' < 'u')
           , array_length(conkey, 1)        -- else, fewest columns as quick tie breaker
   LIMIT  1                                 -- ... but any 1 of them is good
   )   c
JOIN   unnest(c.conkey) attnum ON TRUE      -- unnest array, possibly mult. cols
JOIN   pg_catalog.pg_attribute a USING (attrelid, attnum)

UNION  ALL                                  -- Default if first query returns no row
SELECT _tbl::text || '::text COLLATE "C"'   -- See chapter "Default" below
LIMIT  1
$func$ LANGUAGE sql;


-
This is very fast.

-
The function takes the table name as parameter, type regclass to be precise. Resolves the name according to the current search_path and raises an exception if invalid. At the same time provides a properly quoted identifier in the result automatically.

-
Any PRIMARY KEY or UNIQUE constraint is good. PK has priority, else the unique constraint with the fewest columns is picked.

-
The constraint can consist of multiple columns, attribute numbers of which are stored in the array pg_constraint.conkey. Using an implicit JOIN LATERAL to unnest() that.

-
Result columns are table-qualified to get syntax ready for queries on multiple tables. This optional feature may have to be modified if you are using table aliases in queries.

-
Also quote column names correctly to avoid SQL injection. quote_ident() does that for you. More about that:

  • SQL injection in Postgres functions vs prepared queries



Default

If no constraint is found, the function returns NULL. Wrap the result into COALESCE and use the text representation of the whole row as default: tbl::text.

This is not going to sort rows like original values would, but you get a consistent sort order that works for every table. Since you are going to compare text representations anyway.

Better still, add COLLATE "C" to ignore collation rules. Faster, and just as good for your purpose.

ORDER BY tbl::text COLLATE "C"


Still a potentially very inefficient measure of last resort. Use PKs if at all possible. I put the default into the function above. Everything put together, for a query:

SELECT * FROM u JOIN p USING (a) JOIN n ON n.b = p.b ORDER BY ?


This returns the complete string for ?:

SELECT concat_ws(', ', f_order_cols('u')
                     , f_order_cols('p')
                     , f_order_cols('n')) AS order_by;


SQL Fiddle demonstrating all.

Code Snippets

CREATE OR REPLACE FUNCTION f_order_cols(_tbl regclass)
  RETURNS text AS
$func$
SELECT string_agg(_tbl::text || '.' || quote_ident(attname), ',')
FROM (
   SELECT conrelid AS attrelid, conkey
   FROM   pg_catalog.pg_constraint
   WHERE  conrelid = $1
   AND    contype = ANY ('{p,u}'::"char"[]) -- primary or unique constraint
   ORDER  BY contype                        -- pk has priority ('p' < 'u')
           , array_length(conkey, 1)        -- else, fewest columns as quick tie breaker
   LIMIT  1                                 -- ... but any 1 of them is good
   )   c
JOIN   unnest(c.conkey) attnum ON TRUE      -- unnest array, possibly mult. cols
JOIN   pg_catalog.pg_attribute a USING (attrelid, attnum)

UNION  ALL                                  -- Default if first query returns no row
SELECT _tbl::text || '::text COLLATE "C"'   -- See chapter "Default" below
LIMIT  1
$func$ LANGUAGE sql;
ORDER BY tbl::text COLLATE "C"
SELECT * FROM u JOIN p USING (a) JOIN n ON n.b = p.b ORDER BY ?
SELECT concat_ws(', ', f_order_cols('u')
                     , f_order_cols('p')
                     , f_order_cols('n')) AS order_by;

Context

StackExchange Database Administrators Q#75876, answer score: 7

Revisions (0)

No revisions yet.