HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

how to chain postgres RULEs?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
chainrulespostgreshow

Problem

I have implemented data denormalization strategy using postgresql RULEs. I picked rules instead of triggers for performance reasons.

Schema is structured like this:

  • Application has many clients



  • Client has many projects



  • Project has many users



One part of the system is storing hits for every user in stats table. Hit is an imaginary metric, it is not really relevant. System can collect many of these metrics. There are a lot of records in stats table (> 1,000,000 per day).

I want to know how many hits are per user, per project, per client and per application for given day.

To make it work fast, I've groupped stats by day and stored the output into user_hits table. During this process, also the application_id, client_id and project_id has been added (as columns), and appropriate indexes created.

I want to further optimise the process by grouping things by project_id, client_id and finally application_id. The data pipeline is like this:

stats -> user_hits -> project_hits -> client_hits -> application_hits

I want to make sure when I delete the data from user_hits for given day, that the data in project_hits for that same date is also deleted. This process should propagate to last table in chain.

I defined these simple rules:

CREATE RULE delete_children AS ON DELETE TO user_hits
  DO ALSO
  DELETE FROM project_hits WHERE day = OLD.day;

CREATE RULE delete_children AS ON DELETE TO project_hits
  DO ALSO
  DELETE FROM client_hits WHERE day = OLD.day;

CREATE RULE delete_children AS ON DELETE TO client_hits
  DO ALSO
  DELETE FROM application_hits WHERE day = OLD.day;


However, when I issue statement like this:

DELETE FROM user_hits WHERE day = current_date;


I expect it to run these 3 queries in return:

DELETE FROM project_hits WHERE day = current_date;
DELETE FROM client_hits WHERE day = current_date;
DELETE FROM application_hits WHERE day = current_date;


However, it doesn't.

It completes the operation, but it takes c

Solution

Next time, please include the EXPLAIN output rather than making us dig for it in your scripts. There's no guarantee my system is using the same plan as yours (although with your test data it is likely).

The rule system here is working properly. First, the I want to include my own diagnostic queries (note I did not run EXPLAIN ANALYSE since I was just interested in what query plan was generated):

rulestest=# explain DELETE FROM user_hits WHERE day = '2013-03-16';
                                              QUERY PLAN                        

--------------------------------------------------------------------------------
----------------------
 Delete on application_hits  (cost=0.00..3953181.85 rows=316094576 width=24)
   ->  Nested Loop  (cost=0.00..3953181.85 rows=316094576 width=24)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..128.53 rows=6352 width=22)
               ->  Nested Loop  (cost=0.00..96.78 rows=6352 width=22)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..2.49 rows=16 width=16)
                           ->  Nested Loop  (cost=0.00..2.41 rows=16 width=16)
                                 ->  Seq Scan on application_hits  (cost=0.00..1
.10 rows=4 width=10)
                                       Filter: (day = '2013-03-16'::date)
                                 ->  Materialize  (cost=0.00..1.12 rows=4 width=
10)
                                       ->  Seq Scan on client_hits  (cost=0.00..
1.10 rows=4 width=10)
                                             Filter: (day = '2013-03-16'::date)

 Delete on client_hits  (cost=0.00..989722.41 rows=79023644 width=18)
   ->  Nested Loop  (cost=0.00..989722.41 rows=79023644 width=18)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..43.83 rows=1588 width=16)
               ->  Nested Loop  (cost=0.00..35.89 rows=1588 width=16)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..1.12 rows=4 width=10)
                           ->  Seq Scan on client_hits  (cost=0.00..1.10 rows=4 
width=10)
                                 Filter: (day = '2013-03-16'::date)

 Delete on project_hits  (cost=0.00..248851.80 rows=19755911 width=12)
   ->  Nested Loop  (cost=0.00..248851.80 rows=19755911 width=12)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..16.91 rows=397 width=10)
               ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 width=10
)
                     Filter: (day = '2013-03-16'::date)

 Delete on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
   ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
         Filter: (day = '2013-03-16'::date)
(39 rows)

rulestest=# select distinct day from application_hits;
    day     
------------
 2013-03-15
 2013-03-16
(2 rows)

rulestest=# select count(*), day from application_hits group by day;
 count |    day     
-------+------------
     4 | 2013-03-15
     4 | 2013-03-16
(2 rows)

rulestest=# select count(*), day from client_hits group by day;
 count |    day     
-------+------------
     4 | 2013-03-15
     4 | 2013-03-16
(2 rows)

rulestest=# select count(*), day from project_hits group by day;
 count |    day     
-------+------------
   397 | 2013-03-15
   397 | 2013-03-16
(2 rows)


If your data is anything like your existing data, neither rules nor triggers will work very well. Better will be a stored procedure which you pass a value and it deletes everything you want.

First let's note that indexes here will get you nowhere because in all cases you are pulling half of the tables (I did add indexes on day on all tables to help the planner but this made no real difference).

You need to start with what you are doing with RULEs. RULEs basically rewrite queries and they do so using ways that are as robust as possible. Your code also doesn't match your example though it matches your question better. You have rules on tables which cascade to rules on other tables which cascade to rules on other tables

Therefore when you delete from user_hits where [criteria], the rules transform this into a set of queries:

DELETE FROM application_hits 
 WHERE day IN (SELECT day FROM client_hits 
               WHERE day IN (SELECT day FROM user_hits WHERE [condition]));
DELETE FROM client_hits
  WHERE day IN (SELECT day FROM user_hits WHERE [condition]);
DELETE FROM user_hits WHERE [condition];


Now, you might think we cou

Code Snippets

rulestest=# explain DELETE FROM user_hits WHERE day = '2013-03-16';
                                              QUERY PLAN                        

--------------------------------------------------------------------------------
----------------------
 Delete on application_hits  (cost=0.00..3953181.85 rows=316094576 width=24)
   ->  Nested Loop  (cost=0.00..3953181.85 rows=316094576 width=24)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..128.53 rows=6352 width=22)
               ->  Nested Loop  (cost=0.00..96.78 rows=6352 width=22)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..2.49 rows=16 width=16)
                           ->  Nested Loop  (cost=0.00..2.41 rows=16 width=16)
                                 ->  Seq Scan on application_hits  (cost=0.00..1
.10 rows=4 width=10)
                                       Filter: (day = '2013-03-16'::date)
                                 ->  Materialize  (cost=0.00..1.12 rows=4 width=
10)
                                       ->  Seq Scan on client_hits  (cost=0.00..
1.10 rows=4 width=10)
                                             Filter: (day = '2013-03-16'::date)

 Delete on client_hits  (cost=0.00..989722.41 rows=79023644 width=18)
   ->  Nested Loop  (cost=0.00..989722.41 rows=79023644 width=18)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..43.83 rows=1588 width=16)
               ->  Nested Loop  (cost=0.00..35.89 rows=1588 width=16)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..1.12 rows=4 width=10)
                           ->  Seq Scan on client_hits  (cost=0.00..1.10 rows=4 
width=10)
                                 Filter: (day = '2013-03-16'::date)

 Delete on project_hits  (cost=0.00..248851.80 rows=19755911 width=12)
   ->  Nested Loop  (cost=0.00..248851.80 rows=19755911 width=12)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..16.91 rows=397 width=10)
               ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 width=10
)
                     Filter: (day = '2013-03-16'::date)

 Delete on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
   ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
         Filter: (day = '2013-03-16'::date)
(39 rows)

rulestest=# select distinct day from application_hits;
    day     
------------
 2013-03-15
 2013-03-16
(2 rows)

rulestest=# select count(*), day from appl
DELETE FROM application_hits 
 WHERE day IN (SELECT day FROM client_hits 
               WHERE day IN (SELECT day FROM user_hits WHERE [condition]));
DELETE FROM client_hits
  WHERE day IN (SELECT day FROM user_hits WHERE [condition]);
DELETE FROM user_hits WHERE [condition];
CREATE OR REPLACE FUNCTION delete_stats_at_date(in_day date) RETURNS BOOL 
LANGUAGE SQL AS
$$
DELETE FROM application_hits WHERE day = $1;
DELETE FROM project_hits WHERE day = $1;
DELETE FROM client_hits WHERE day  = $1;
DELETE FROM user_hits WHERE day = $1;
SELECT TRUE;
$$;

Context

StackExchange Database Administrators Q#36842, answer score: 7

Revisions (0)

No revisions yet.