patternsqlMajor
Refresh materalized view incrementally in PostgreSQL
Viewed 0 times
postgresqlincrementallyrefreshviewmateralized
Problem
Is it possible to refresh a materialized view incrementally in PostgreSQL i.e. only for the data that is new or has changed?
Consider this table & materialized view:
Periodically, new values are added to
What's a good way to keep track of updated & new values and only refresh the view partially?
Consider this table & materialized view:
CREATE TABLE graph (
xaxis integer NOT NULL,
value integer NOT NULL,
);
CREATE MATERIALIZED VIEW graph_avg AS
SELECT xaxis, AVG(value)
FROM graph
GROUP BY xaxisPeriodically, new values are added to
graph or an existing value is updated. I want to refresh the view graph_avg every couple of hours only for the values that have updated. However in PostgreSQL 9.3, the whole table is refreshed. This is quite time consuming. The next version 9.4 allows CONCURRENT update but it still refreshes the entire view. With 100s of millions of rows, this takes a few minutes. What's a good way to keep track of updated & new values and only refresh the view partially?
Solution
You can always implement your own table serving as "materialized view". That's how we did it before
You can create a plain
And materialize the result once or whenever you need to start over:
(Or use the
Then, depending on undisclosed details of your use case, you can
A basic DML statement with data-modifying CTEs for your table as is:
Assuming nobody else tries to write to
Basic recipe
-
Create a tiny table to remember the timestamp of your latest snapshot. Let's call it
-
Create this partial, multicolumn index:
-
Use the timestamp of the last snapshot as predicate in your queries to refresh the snapshot with perfect index usage.
-
At the end of the transaction, drop the index and recreate it with the transaction timestamp replacing the timestamp in the index predicate (initially
-
Note that the partial index is great to cover
MATERIALIZED VIEW was implemented in Postgres 9.3. You can create a plain
VIEW:CREATE VIEW graph_avg_view AS
SELECT xaxis, AVG(value) AS avg_val
FROM graph
GROUP BY xaxis;And materialize the result once or whenever you need to start over:
CREATE TABLE graph_avg AS
SELECT * FROM graph_avg_view;(Or use the
SELECT statement directly, without creating a VIEW.)Then, depending on undisclosed details of your use case, you can
DELETE / UPDATE / INSERT changes manually.A basic DML statement with data-modifying CTEs for your table as is:
Assuming nobody else tries to write to
graph_avg concurrently (reading is no problem):WITH del AS (
DELETE FROM graph_avg t
WHERE NOT EXISTS (SELECT FROM graph_avg_view WHERE xaxis = t.xaxis)
)
, upd AS (
UPDATE graph_avg t
SET avg_val = v.avg_val
FROM graph_avg_view v
WHERE t.xaxis = v.xaxis
AND t.avg_val <> v.avg_val
-- AND t.avg_val IS DISTINCT FROM v.avg_val -- alt if avg_val can be NULL
)
INSERT INTO graph_avg t -- no target list, whole row
SELECT v.*
FROM graph_avg_view v
WHERE NOT EXISTS (SELECT FROM graph_avg WHERE xaxis = v.xaxis);Basic recipe
- Add a
timestampcolumn with defaultnow()to your base table. Let's call itts.
- If you have updates, add a trigger to set the current timestamp with every update that changes either
xaxisorvalue.
-
Create a tiny table to remember the timestamp of your latest snapshot. Let's call it
mv:CREATE TABLE mv (
tbl text PRIMARY KEY
, ts timestamp NOT NULL DEFAULT '-infinity'
); -- possibly more details-
Create this partial, multicolumn index:
CREATE INDEX graph_mv_latest ON graph (xaxis, value)
WHERE ts >= '-infinity';-
Use the timestamp of the last snapshot as predicate in your queries to refresh the snapshot with perfect index usage.
-
At the end of the transaction, drop the index and recreate it with the transaction timestamp replacing the timestamp in the index predicate (initially
'-infinity'), which you also save to your table. Everything in one transaction.-
Note that the partial index is great to cover
INSERT and UPDATE operations, but not DELETE. To cover that, you need to consider the entire table. It all depends on exact requirements.Code Snippets
CREATE VIEW graph_avg_view AS
SELECT xaxis, AVG(value) AS avg_val
FROM graph
GROUP BY xaxis;CREATE TABLE graph_avg AS
SELECT * FROM graph_avg_view;WITH del AS (
DELETE FROM graph_avg t
WHERE NOT EXISTS (SELECT FROM graph_avg_view WHERE xaxis = t.xaxis)
)
, upd AS (
UPDATE graph_avg t
SET avg_val = v.avg_val
FROM graph_avg_view v
WHERE t.xaxis = v.xaxis
AND t.avg_val <> v.avg_val
-- AND t.avg_val IS DISTINCT FROM v.avg_val -- alt if avg_val can be NULL
)
INSERT INTO graph_avg t -- no target list, whole row
SELECT v.*
FROM graph_avg_view v
WHERE NOT EXISTS (SELECT FROM graph_avg WHERE xaxis = v.xaxis);CREATE TABLE mv (
tbl text PRIMARY KEY
, ts timestamp NOT NULL DEFAULT '-infinity'
); -- possibly more detailsCREATE INDEX graph_mv_latest ON graph (xaxis, value)
WHERE ts >= '-infinity';Context
StackExchange Database Administrators Q#86779, answer score: 35
Revisions (0)
No revisions yet.