HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Count NULL values per row

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
pernullvaluescountrow

Problem

I would like to count the number of null's present per row in a table without enumerating column names. For example:

WITH t as (VALUES
  (NULL ,'hi',2,NULL,'null'),
  ('' ,'hi',2,3,'test'),
  (NULL ,'hi',2,3,'null')
)
SELECT countnulls(t)
FROM t;


Would result in:

numnulls
2
0
1


The closest I can get is with the following hack of row_to_json():

select 
(CHAR_LENGTH(row_to_json(t)::text)
 - CHAR_LENGTH(REPLACE(row_to_json(t)::text, 'null', '')))/4 from t;


Which is... quite the hack (not in a good way). It works, sort of, but it counts the string 'null' as a NULL when it is present in the actual data or in the column names. So it is incorrect in the above case.

Solution


  1. You know the column names



For Postgres 9.6 or later, use num_nulls()

SELECT id, num_nulls(a,b,c,d,e)
FROM   tbl;


Returns your desired result exactly, for any mix of data types.

The manual:

num_nulls(VARIADIC "any") ... returns the number of null arguments

For Postgres 9.5 or older, convert to text[], array_remove(arr, null) and use the remaining array length for an exact count:

SELECT 5 - cardinality(array_remove(ARRAY[a::text,b::text,c::text,d::text,e::text], null))
FROM   tbl;


Any type can be cast to text. The cast is redundant for text columns, of course.

array_remove() requires Postgres 9.3 or later.

cardinality() requires Postgres 9.4 or later. Substitute with array_length(arr, 1) in older versions.
  1. You don't know column names, but Postgres does



When building on actual tables (or other registered objects like a view or a materialized view), we can retrieve column names from the system catalog pg_attribute to fully automate with dynamic SQL. Like:

CREATE OR REPLACE FUNCTION f_num_nulls(_tbl regclass)
  RETURNS SETOF int
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY EXECUTE format(
      'SELECT num_nulls(%s) FROM %s'
    , (SELECT string_agg(quote_ident(attname), ', ')  -- column list
       FROM   pg_attribute
       WHERE  attrelid = _tbl
       AND    NOT attisdropped    -- no dropped (dead) columns
       AND    attnum > 0)         -- no system columns
    , _tbl
   );
END
$func$;


Call:

SELECT * FROM f_num_nulls('myschema.tbl');


Returns the count for for each row in current physical order. Nothing else, to be absolutely generic.

Related:

  • Table name as a PostgreSQL function parameter



  • psql: SELECT * ... except one column



We could also pass each row to return a single count for it using a polymorphic function. Related:

  • Refactor a PL/pgSQL function to return the output of various SELECT queries



  1. You don't know anything: anonymous records



In the unlikely event that column names are unknown even to Postgres (like from a VALUES expression in your example), convert to a document type (json, jsonb, xml, hstore) to get a handle, like demonstrated by ypercube (comment deleted by now) and Evan.

But anonymous records do not have primary keys or any other unique attribute per definition. Count within each LATERAL subquery to defend against false aggregates. Demo with jsonb:

SELECT *
FROM  (
   VALUES
      (NULL ,'hi',2,NULL,'null')
    , (NULL ,'hi',2,NULL,'null')       -- duplicate row !!!
    , ('' ,'hi',2,3,'test')
    , (NULL ,'hi',2,3,'null')
   ) t                                 -- column names unknown
, LATERAL (
   SELECT count(*) FILTER (WHERE j.value = jsonb 'null') AS num_nulls
   FROM   jsonb_each(to_jsonb(t)) j
   ) c;


Or with json: probably a bit faster because the conversion is cheaper.

Demonstrating 3 different ways:

SELECT *
FROM  (
   VALUES
      (NULL ,'hi',2,NULL,'null')
    , (NULL ,'hi',2,NULL,'null')
    , ('' ,'hi',2,3,'test')
    , (NULL ,'hi',2,3,'null')
   ) t   -- column names unknown
, to_json(t) j
, LATERAL (
   SELECT count(*) FILTER (WHERE j1.value::text = 'null') AS num_nulls1
   FROM   json_each(to_json(t)) j1
   ) c1
, LATERAL (
   SELECT count(*) FILTER (WHERE j->>k IS NULL) AS num_nulls2
   FROM   json_object_keys(j) k
   ) c2
, LATERAL (
   SELECT count(*) - count(j->>k ) AS num_nulls3
   FROM   json_object_keys(j) k
   ) c3;


db<>fiddle here

Code Snippets

SELECT id, num_nulls(a,b,c,d,e)
FROM   tbl;
SELECT 5 - cardinality(array_remove(ARRAY[a::text,b::text,c::text,d::text,e::text], null))
FROM   tbl;
CREATE OR REPLACE FUNCTION f_num_nulls(_tbl regclass)
  RETURNS SETOF int
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY EXECUTE format(
      'SELECT num_nulls(%s) FROM %s'
    , (SELECT string_agg(quote_ident(attname), ', ')  -- column list
       FROM   pg_attribute
       WHERE  attrelid = _tbl
       AND    NOT attisdropped    -- no dropped (dead) columns
       AND    attnum > 0)         -- no system columns
    , _tbl
   );
END
$func$;
SELECT * FROM f_num_nulls('myschema.tbl');
SELECT *
FROM  (
   VALUES
      (NULL ,'hi',2,NULL,'null')
    , (NULL ,'hi',2,NULL,'null')       -- duplicate row !!!
    , ('' ,'hi',2,3,'test')
    , (NULL ,'hi',2,3,'null')
   ) t                                 -- column names unknown
, LATERAL (
   SELECT count(*) FILTER (WHERE j.value = jsonb 'null') AS num_nulls
   FROM   jsonb_each(to_jsonb(t)) j
   ) c;

Context

StackExchange Database Administrators Q#205887, answer score: 14

Revisions (0)

No revisions yet.