patternsqlMinor
Refresh Materialized View impact on the DB
Viewed 0 times
impactthematerializedrefreshview
Problem
Hi we are running an PostgreSQL 9.6 database in Amazon RDS under a m4.large(2cpu 8gb) and a provisioned IOPS of 1000. The use case is the following: We have a table with several million of registries (4M more or less) and we have created a materialized view with a subset of this table (2M aprox) changing some columns types to be more efficient in querying. Our pg_conf has not been changed, being RDS Postgres' default.
This is our view definition:
```
CREATE MATERIALIZED VIEW public.customers_mv as
SELECT
id,
gender,
contact_info,
location,
social,
categories,
(social ->> 'follower_count')::integer AS social_follower_count,
(social ->> 'following_count')::integer AS social_following_count,
(social ->> 'peemv')::float AS social_emv,
(social ->> 'engagement')::float AS social_engagement,
(social ->> 'v')::boolean AS social_validated,
search_vector,
flags,
to_tsvector('english',concat_ws(' ','aal0_'||(customers.location ->> 'aal0'),
'aal1_'||(customers.location ->> 'aal1'),
'aal2_'||(customers.location ->> 'aal2'),
'frequent_location_aal0_'||(customers.location -> 'frequent_location' ->> 'aal0'),
'frequent_location_aal1_'||(customers.location -> 'frequent_location' ->> 'aal1'),
'frequent_location_aal2_'||(customers.location -> 'frequent_location' ->> 'aal2'),
'last_post_location_aal0_'||(customers.location -> 'last_post_location' ->> 'aal0'),
'last_post_location_aal1_'||(customers.location -> 'last_post_location' ->> 'aal1'),
'last_post_location_aal2_'||(customers.location -> 'last_post_location' ->> 'aal2'),
'admin_location_aal0_'||(customers.location -> 'admin_location' ->> 'aal0'),
'bio_location_aal0_'||(customers.location -> 'bio_location' ->> 'aal0'))) as loc_vector
FROM public.customers
WHERE (customers.social -> 'follower_count') > '5000'
AND customers.social ? 'last_posts'
AND (customers.flags IS NULL OR NOT customers.flags @> '{"destroy": true
This is our view definition:
```
CREATE MATERIALIZED VIEW public.customers_mv as
SELECT
id,
gender,
contact_info,
location,
social,
categories,
(social ->> 'follower_count')::integer AS social_follower_count,
(social ->> 'following_count')::integer AS social_following_count,
(social ->> 'peemv')::float AS social_emv,
(social ->> 'engagement')::float AS social_engagement,
(social ->> 'v')::boolean AS social_validated,
search_vector,
flags,
to_tsvector('english',concat_ws(' ','aal0_'||(customers.location ->> 'aal0'),
'aal1_'||(customers.location ->> 'aal1'),
'aal2_'||(customers.location ->> 'aal2'),
'frequent_location_aal0_'||(customers.location -> 'frequent_location' ->> 'aal0'),
'frequent_location_aal1_'||(customers.location -> 'frequent_location' ->> 'aal1'),
'frequent_location_aal2_'||(customers.location -> 'frequent_location' ->> 'aal2'),
'last_post_location_aal0_'||(customers.location -> 'last_post_location' ->> 'aal0'),
'last_post_location_aal1_'||(customers.location -> 'last_post_location' ->> 'aal1'),
'last_post_location_aal2_'||(customers.location -> 'last_post_location' ->> 'aal2'),
'admin_location_aal0_'||(customers.location -> 'admin_location' ->> 'aal0'),
'bio_location_aal0_'||(customers.location -> 'bio_location' ->> 'aal0'))) as loc_vector
FROM public.customers
WHERE (customers.social -> 'follower_count') > '5000'
AND customers.social ? 'last_posts'
AND (customers.flags IS NULL OR NOT customers.flags @> '{"destroy": true
Solution
Your problem is likely the indexes.
IIRC, refreshing a materialized view drops the existing data and create a new "table" with the current data.
What this does for your indexes is re-index the entire subset of data, which based on your indexes send like a significant workload.
My recommendation would be to not use the built-in materialized views and to instead roll your own equivalent, incrementally updating data so your are only writing stale data/indexes.
The trade off is that this will be much more complex to build out than simply creating a materialized view, and you'll need to consider how you want to track invalidation of stale data.
IIRC, refreshing a materialized view drops the existing data and create a new "table" with the current data.
What this does for your indexes is re-index the entire subset of data, which based on your indexes send like a significant workload.
My recommendation would be to not use the built-in materialized views and to instead roll your own equivalent, incrementally updating data so your are only writing stale data/indexes.
The trade off is that this will be much more complex to build out than simply creating a materialized view, and you'll need to consider how you want to track invalidation of stale data.
Context
StackExchange Database Administrators Q#184672, answer score: 2
Revisions (0)
No revisions yet.