patternsqlMinor
Check whether empty strings are present in character-type columns
Viewed 0 times
presentcolumnsareemptycharacterstringstypecheckwhether
Problem
I have an application that (as part of its logic) trims strings and replaces empty strings with NULL prior to insertion in the database. I guess one way to ensure that this is enforced would be to write a CHECK on every table that has a
Assuming that one can't or does not want to do that is there a way to write a simple, generic, SQL query (obtaining table and column names from the database's metadata) that would check if any textual columns in the database contain empty strings?
VARCHAR, TEXT (or similar) column.Assuming that one can't or does not want to do that is there a way to write a simple, generic, SQL query (obtaining table and column names from the database's metadata) that would check if any textual columns in the database contain empty strings?
Solution
Function for a single table
Returns all character-type columns of the given table with a count of empty values (
Call:
Returns:
Works for Postgres 9.1 or later.
Output table names are automatically schema-qualified if needed, according to the current
Output table names and column names are automatically escaped if needed.
Input table name can optionally be schema-qualified, else defaults to the current
Your role needs privileges to actually read from the given table.
The function is highly optimized and only runs a single scan on the given table to check on all relevant columns.
Should be safe against SQL injection.
Using parallel
You will be interested in this related answer on SO to actually replace empty strings - with more explanation:
Match strings of only space characters, too
As you commented,
How?
To find strings of only space characters, too (not other white space!), add the cast to these lines above:
Wrapper function to report on a whole schema
Call:
Returns:
db<>fiddle here
Old sqlfiddle
Returns all character-type columns of the given table with a count of empty values (
'') and whether they are defined NOT NULL.CREATE OR REPLACE FUNCTION f_tbl_empty_status(_tbl regclass)
RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
LANGUAGE plpgsql AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar}'; -- base char types
_sql text;
_col_arr text[];
_null_arr bool[];
BEGIN
-- Build command
SELECT INTO _col_arr, _null_arr, _sql
array_agg(s.col)
, array_agg(s.attnotnull)
, '
SELECT $1
, unnest($2)
, unnest(ARRAY [count('
|| string_agg(s.col, ' = '''' OR NULL), count(')
|| ' = '''' OR NULL)])
, unnest($3)
FROM ' || _tbl
FROM (
SELECT quote_ident(attname) AS col, attnotnull
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
-- AND NOT attnotnull -- include columns defined NOT NULL
AND atttypid = ANY(_typ) -- only character types
ORDER BY attnum
) AS s;
-- Debug
-- RAISE NOTICE '%', _sql;
-- Execute
IF _sql IS NULL THEN
-- do nothing, nothing to return
ELSE
RETURN QUERY EXECUTE _sql
USING _tbl::text, _col_arr, _null_arr;
END IF;
END
$func$;Call:
SELECT * FROM f_tbl_empty_status('tbl_name'); -- optionally schema-qualifiedReturns:
tbl | col | empty_ct | not_null
------+------------+----------+---------
tbl1 | txt | 0 | f
tbl1 | vc | 3 | f
tbl1 | "oDD name" | 7 | fWorks for Postgres 9.1 or later.
Output table names are automatically schema-qualified if needed, according to the current
search_path.Output table names and column names are automatically escaped if needed.
empty_ct is the count of rows where the value of the column is the empty stringnot_null reports whether the column is defined NOT NULL (so you cannot convert possible empty strings to NULL!)Input table name can optionally be schema-qualified, else defaults to the current
search_path.Your role needs privileges to actually read from the given table.
The function is highly optimized and only runs a single scan on the given table to check on all relevant columns.
Should be safe against SQL injection.
Using parallel
unnest() to simplify the complex code somewhat:- Unnest multiple arrays into rows
You will be interested in this related answer on SO to actually replace empty strings - with more explanation:
- Replace blank spaces with null values
Match strings of only space characters, too
As you commented,
trim(s.col, ' ') = '' does the job just fine. But here's a shortcut:s.col::char = ''How?
char is an alias for character(1), the rarely useful blank-padded type. Values are padded with space characters to the right up to the length specifier (which is 1 in this case, but irrelevant). Trailing spaces are effectively insignificant for this type. So ' ' is the same as '' or ' '. Voilá. And yes, it is faster, too, I tested.To find strings of only space characters, too (not other white space!), add the cast to these lines above:
|| string_agg(s.col, '::char = '''' OR NULL), count(')
|| '::char = '''' OR NULL)])Wrapper function to report on a whole schema
CREATE OR REPLACE FUNCTION f_schema_empty_status(_sch text DEFAULT 'public')
RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
LANGUAGE plpgsql AS
$func$
DECLARE
_tbl regclass;
BEGIN
FOR _tbl IN
SELECT c.oid
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = _sch -- 'public' by default
-- AND c.relname LIKE 'my_pattern%' -- optionally filter table names
AND c.relkind = 'r' -- regular tables only
ORDER BY relname
LOOP
-- Debug
-- RAISE NOTICE 'table: %', _tbl;
RETURN QUERY
SELECT * FROM f_tbl_empty_status(_tbl);
END LOOP;
END
$func$;Call:
SELECT * FROM f_schema_empty_status(); -- defaults to 'public' without parameterReturns:
tbl | col | empty_ct | not_null
------+------------+----------+---------
tbl1 | txt | 0 | f
tbl1 | vc | 3 | f
tbl1 | "oDD name" | 7 | f
tbl2 | some_text | 123 | t
...
db<>fiddle here
Old sqlfiddle
Code Snippets
CREATE OR REPLACE FUNCTION f_tbl_empty_status(_tbl regclass)
RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
LANGUAGE plpgsql AS
$func$
DECLARE
_typ CONSTANT regtype[] := '{text, bpchar, varchar}'; -- base char types
_sql text;
_col_arr text[];
_null_arr bool[];
BEGIN
-- Build command
SELECT INTO _col_arr, _null_arr, _sql
array_agg(s.col)
, array_agg(s.attnotnull)
, '
SELECT $1
, unnest($2)
, unnest(ARRAY [count('
|| string_agg(s.col, ' = '''' OR NULL), count(')
|| ' = '''' OR NULL)])
, unnest($3)
FROM ' || _tbl
FROM (
SELECT quote_ident(attname) AS col, attnotnull
FROM pg_attribute
WHERE attrelid = _tbl -- valid, visible, legal table name
AND attnum >= 1 -- exclude tableoid & friends
AND NOT attisdropped -- exclude dropped columns
-- AND NOT attnotnull -- include columns defined NOT NULL
AND atttypid = ANY(_typ) -- only character types
ORDER BY attnum
) AS s;
-- Debug
-- RAISE NOTICE '%', _sql;
-- Execute
IF _sql IS NULL THEN
-- do nothing, nothing to return
ELSE
RETURN QUERY EXECUTE _sql
USING _tbl::text, _col_arr, _null_arr;
END IF;
END
$func$;SELECT * FROM f_tbl_empty_status('tbl_name'); -- optionally schema-qualifiedtbl | col | empty_ct | not_null
------+------------+----------+---------
tbl1 | txt | 0 | f
tbl1 | vc | 3 | f
tbl1 | "oDD name" | 7 | fs.col::char = ''|| string_agg(s.col, '::char = '''' OR NULL), count(')
|| '::char = '''' OR NULL)])Context
StackExchange Database Administrators Q#81966, answer score: 4
Revisions (0)
No revisions yet.