HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Refresh Materialized View impact on the DB

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
impactthematerializedrefreshview

Problem

Hi we are running an PostgreSQL 9.6 database in Amazon RDS under a m4.large(2cpu 8gb) and a provisioned IOPS of 1000. The use case is the following: We have a table with several million of registries (4M more or less) and we have created a materialized view with a subset of this table (2M aprox) changing some columns types to be more efficient in querying. Our pg_conf has not been changed, being RDS Postgres' default.

This is our view definition:

```
CREATE MATERIALIZED VIEW public.customers_mv as
SELECT
id,
gender,
contact_info,
location,
social,
categories,
(social ->> 'follower_count')::integer AS social_follower_count,
(social ->> 'following_count')::integer AS social_following_count,
(social ->> 'peemv')::float AS social_emv,
(social ->> 'engagement')::float AS social_engagement,
(social ->> 'v')::boolean AS social_validated,
search_vector,
flags,
to_tsvector('english',concat_ws(' ','aal0_'||(customers.location ->> 'aal0'),
'aal1_'||(customers.location ->> 'aal1'),
'aal2_'||(customers.location ->> 'aal2'),
'frequent_location_aal0_'||(customers.location -> 'frequent_location' ->> 'aal0'),
'frequent_location_aal1_'||(customers.location -> 'frequent_location' ->> 'aal1'),
'frequent_location_aal2_'||(customers.location -> 'frequent_location' ->> 'aal2'),
'last_post_location_aal0_'||(customers.location -> 'last_post_location' ->> 'aal0'),
'last_post_location_aal1_'||(customers.location -> 'last_post_location' ->> 'aal1'),
'last_post_location_aal2_'||(customers.location -> 'last_post_location' ->> 'aal2'),
'admin_location_aal0_'||(customers.location -> 'admin_location' ->> 'aal0'),
'bio_location_aal0_'||(customers.location -> 'bio_location' ->> 'aal0'))) as loc_vector
FROM public.customers
WHERE (customers.social -> 'follower_count') > '5000'
AND customers.social ? 'last_posts'
AND (customers.flags IS NULL OR NOT customers.flags @> '{"destroy": true

Solution

Your problem is likely the indexes.

IIRC, refreshing a materialized view drops the existing data and create a new "table" with the current data.

What this does for your indexes is re-index the entire subset of data, which based on your indexes send like a significant workload.

My recommendation would be to not use the built-in materialized views and to instead roll your own equivalent, incrementally updating data so your are only writing stale data/indexes.

The trade off is that this will be much more complex to build out than simply creating a materialized view, and you'll need to consider how you want to track invalidation of stale data.

Context

StackExchange Database Administrators Q#184672, answer score: 2

Revisions (0)

No revisions yet.