HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Count items on condition

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
countitemscondition

Problem

In an application where a user can have copies of a mobile application installed on multiple devices I'd like to know how many iphone and how many android apps each user has.

I tried a query such as

gv2=> SELECT userid, 
       COUNT(ALL version_identifier LIKE '%ios%'), 
       COUNT(ALL version_identifier LIKE '%android%') 
FROM gl.user_device 
GROUP BY userid;
                userid                | count | count 
--------------------------------------+-------+-------
 46d0f5b7-42b0-4aad-9162-1390c32cb06e |     7 |     7
 5d519794-abfe-4863-82d4-6da33db7637b |     7 |     7
 a81cff6b-30f2-4b6e-a5bf-b1a933904473 |     1 |     1
 b65f0708-0cd1-11e7-878b-06fa189da46b |     4 |     4
 94b91b02-ff43-4a9a-b317-037fa2a347d3 |     1 |     1
 a4cacd98-1216-4801-b058-b28b8fa632a9 |     8 |     8
(6 rows)


Which is clearly wrong.

For example

gv2=> SELECT userid FROM gl.user_device WHERE version_identifier LIKE '%ios%';
                userid                
--------------------------------------
 5d519794-abfe-4863-82d4-6da33db7637b
(1 row)


In case it matters I need to support Postgres 10.1 but preferably also 9.5.10 as I have one old Production DB still on that version.

Solution

Use the modern aggregate FILTER syntax in Postgres 9.4 or later:

SELECT userid,
       COUNT(*) FILTER (WHERE version_identifier LIKE '%ios%') AS nr_ios,
       COUNT(*) FILTER (WHERE version_identifier LIKE '%android%') AS nr_android
FROM   gl.user_device
GROUP  BY userid;


It's shorter, clearer and a bit faster. See:

  • For absolute performance, is SUM faster or COUNT?



But it still results in a sequential scan on the whole table.

If ...

  • the table is big and performance is important,



  • a substantial percentage of rows does not contribute to either count,



  • and you have a matching index - a trigram index on version_identifier in your case, see:



  • Full Text Search With PostgreSQL



... then run two separate queries instead to tap the full potential of the index. Example with two subqueries in a FULL OUTER JOIN to make it a 100% equivalent drop-in replacement:

SELECT userid
     , COALESCE(nr_ios    , 0) AS nr_ios
     , COALESCE(nr_android, 0) AS nr_android
FROM  (
   SELECT userid, COUNT(*) AS nr_ios
   FROM   gl.user_device
   WHERE  version_identifier LIKE '%ios%'
   GROUP  BY 1
   ) i
FULL JOIN (
   SELECT userid, COUNT(*) AS nr_android
   FROM   gl.user_device
   WHERE  version_identifier LIKE '%android%'
   GROUP  BY 1
   ) a USING (userid);


Related:

  • PostgreSQL merge two queries with COUNT and GROUP BY in each

Code Snippets

SELECT userid,
       COUNT(*) FILTER (WHERE version_identifier LIKE '%ios%') AS nr_ios,
       COUNT(*) FILTER (WHERE version_identifier LIKE '%android%') AS nr_android
FROM   gl.user_device
GROUP  BY userid;
SELECT userid
     , COALESCE(nr_ios    , 0) AS nr_ios
     , COALESCE(nr_android, 0) AS nr_android
FROM  (
   SELECT userid, COUNT(*) AS nr_ios
   FROM   gl.user_device
   WHERE  version_identifier LIKE '%ios%'
   GROUP  BY 1
   ) i
FULL JOIN (
   SELECT userid, COUNT(*) AS nr_android
   FROM   gl.user_device
   WHERE  version_identifier LIKE '%android%'
   GROUP  BY 1
   ) a USING (userid);

Context

StackExchange Database Administrators Q#227603, answer score: 8

Revisions (0)

No revisions yet.