HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Check whether empty strings are present in character-type columns

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
presentcolumnsareemptycharacterstringstypecheckwhether

Problem

I have an application that (as part of its logic) trims strings and replaces empty strings with NULL prior to insertion in the database. I guess one way to ensure that this is enforced would be to write a CHECK on every table that has a VARCHAR, TEXT (or similar) column.

Assuming that one can't or does not want to do that is there a way to write a simple, generic, SQL query (obtaining table and column names from the database's metadata) that would check if any textual columns in the database contain empty strings?

Solution

Function for a single table

Returns all character-type columns of the given table with a count of empty values ('') and whether they are defined NOT NULL.

CREATE OR REPLACE FUNCTION f_tbl_empty_status(_tbl regclass)
  RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _typ      CONSTANT regtype[] := '{text, bpchar, varchar}';  -- base char types
   _sql      text;
   _col_arr  text[];
   _null_arr bool[];
BEGIN
   -- Build command
   SELECT INTO _col_arr, _null_arr, _sql
          array_agg(s.col)
        , array_agg(s.attnotnull)
        , '
   SELECT $1
        , unnest($2)
        , unnest(ARRAY [count('
                 || string_agg(s.col, ' = '''' OR NULL), count(')
                                   || ' = '''' OR NULL)])
        , unnest($3)
   FROM   ' || _tbl
   FROM  (
      SELECT quote_ident(attname) AS col, attnotnull
      FROM   pg_attribute
      WHERE  attrelid = _tbl           -- valid, visible, legal table name 
      AND    attnum >= 1               -- exclude tableoid & friends
      AND    NOT attisdropped          -- exclude dropped columns
   -- AND    NOT attnotnull            -- include columns defined NOT NULL
      AND    atttypid = ANY(_typ)      -- only character types
      ORDER  BY attnum
      ) AS s;

   -- Debug
   -- RAISE NOTICE '%', _sql;

   -- Execute
   IF _sql IS NULL THEN
      -- do nothing, nothing to return
   ELSE
      RETURN QUERY EXECUTE _sql
      USING  _tbl::text, _col_arr, _null_arr;
   END IF;
END
$func$;


Call:

SELECT * FROM f_tbl_empty_status('tbl_name'); -- optionally schema-qualified


Returns:

tbl   | col        | empty_ct | not_null
------+------------+----------+---------
tbl1  | txt        | 0        | f
tbl1  | vc         | 3        | f
tbl1  | "oDD name" | 7        | f


Works for Postgres 9.1 or later.

Output table names are automatically schema-qualified if needed, according to the current search_path.

Output table names and column names are automatically escaped if needed.

empty_ct is the count of rows where the value of the column is the empty string

not_null reports whether the column is defined NOT NULL (so you cannot convert possible empty strings to NULL!)

Input table name can optionally be schema-qualified, else defaults to the current search_path.

Your role needs privileges to actually read from the given table.

The function is highly optimized and only runs a single scan on the given table to check on all relevant columns.

Should be safe against SQL injection.

Using parallel unnest() to simplify the complex code somewhat:

  • Unnest multiple arrays into rows



You will be interested in this related answer on SO to actually replace empty strings - with more explanation:

  • Replace blank spaces with null values



Match strings of only space characters, too

As you commented, trim(s.col, ' ') = '' does the job just fine. But here's a shortcut:

s.col::char = ''


How?

char is an alias for character(1), the rarely useful blank-padded type. Values are padded with space characters to the right up to the length specifier (which is 1 in this case, but irrelevant). Trailing spaces are effectively insignificant for this type. So ' ' is the same as '' or ' '. Voilá. And yes, it is faster, too, I tested.

To find strings of only space characters, too (not other white space!), add the cast to these lines above:

|| string_agg(s.col, '::char = '''' OR NULL), count(')
                                    || '::char = '''' OR NULL)])


Wrapper function to report on a whole schema

CREATE OR REPLACE FUNCTION f_schema_empty_status(_sch text DEFAULT 'public')
  RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
  LANGUAGE plpgsql AS
$func$
DECLARE 
   _tbl regclass;
BEGIN
   FOR _tbl IN
      SELECT c.oid
      FROM   pg_class c
      JOIN   pg_namespace n ON n.oid = c.relnamespace
      WHERE  n.nspname = _sch  -- 'public' by default
   -- AND    c.relname LIKE 'my_pattern%'  -- optionally filter table names
      AND    c.relkind = 'r'  -- regular tables only
      ORDER  BY relname
   LOOP
   -- Debug
   -- RAISE NOTICE 'table: %', _tbl;

      RETURN QUERY
      SELECT * FROM f_tbl_empty_status(_tbl);
   END LOOP;
END
$func$;


Call:

SELECT * FROM f_schema_empty_status();  -- defaults to 'public' without parameter


Returns:
tbl | col | empty_ct | not_null
------+------------+----------+---------
tbl1 | txt | 0 | f
tbl1 | vc | 3 | f
tbl1 | "oDD name" | 7 | f
tbl2 | some_text | 123 | t
...


db<>fiddle here

Old sqlfiddle

Code Snippets

CREATE OR REPLACE FUNCTION f_tbl_empty_status(_tbl regclass)
  RETURNS TABLE (tbl text, col text, empty_ct bigint, not_null bool)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _typ      CONSTANT regtype[] := '{text, bpchar, varchar}';  -- base char types
   _sql      text;
   _col_arr  text[];
   _null_arr bool[];
BEGIN
   -- Build command
   SELECT INTO _col_arr, _null_arr, _sql
          array_agg(s.col)
        , array_agg(s.attnotnull)
        , '
   SELECT $1
        , unnest($2)
        , unnest(ARRAY [count('
                 || string_agg(s.col, ' = '''' OR NULL), count(')
                                   || ' = '''' OR NULL)])
        , unnest($3)
   FROM   ' || _tbl
   FROM  (
      SELECT quote_ident(attname) AS col, attnotnull
      FROM   pg_attribute
      WHERE  attrelid = _tbl           -- valid, visible, legal table name 
      AND    attnum >= 1               -- exclude tableoid & friends
      AND    NOT attisdropped          -- exclude dropped columns
   -- AND    NOT attnotnull            -- include columns defined NOT NULL
      AND    atttypid = ANY(_typ)      -- only character types
      ORDER  BY attnum
      ) AS s;

   -- Debug
   -- RAISE NOTICE '%', _sql;

   -- Execute
   IF _sql IS NULL THEN
      -- do nothing, nothing to return
   ELSE
      RETURN QUERY EXECUTE _sql
      USING  _tbl::text, _col_arr, _null_arr;
   END IF;
END
$func$;
SELECT * FROM f_tbl_empty_status('tbl_name'); -- optionally schema-qualified
tbl   | col        | empty_ct | not_null
------+------------+----------+---------
tbl1  | txt        | 0        | f
tbl1  | vc         | 3        | f
tbl1  | "oDD name" | 7        | f
s.col::char = ''
|| string_agg(s.col, '::char = '''' OR NULL), count(')
                                    || '::char = '''' OR NULL)])

Context

StackExchange Database Administrators Q#81966, answer score: 4

Revisions (0)

No revisions yet.