patternsqlMinor
Adding an index to a system catalog in Postgres
Viewed 0 times
postgressystemaddingcatalogindex
Problem
I'm having a situation very similar to the one described here:
I've got a SaaS situation where I'm using 1000+ schemas in a single
database (each schema contains the same tables, just different data
per tenant). I used schemas so that the shared app servers could
share a connection to the single database for all schemas. Things are
working fine.
and, while the application itself appears to be working fine, some queries involving system catalogs are very slow. Also,
In particular, I need to calculate the on-disk size of each schema using something like this:
which is quite slow. Looking at the query plan, I see
```
Aggregate (cost=136903.16..136903.17 rows=1 width=4) (actual time=1024.420..1024.420 rows=1 loops=1)
-> Hash Join (cost=8.28..136902.86 rows=59 width=4) (actual time=143.247..1016.749 rows=60 loops=1)
Hash Cond: (c.relnamespace = n.oid)
-> Seq Scan on pg_class c (cost=0.00..133645.24 rows=866333 width=8) (actual time=0.045..943.029 rows=879788 loops=1) │
Filter: (relkind = 'r'::"char")
Rows Removed by Filter: 2610112
-> Hash (cost=8.27..8.27 rows=1 width=4) (actual time=0.032..0.032 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Scan using pg_namespace_nspname_index on pg_
I've got a SaaS situation where I'm using 1000+ schemas in a single
database (each schema contains the same tables, just different data
per tenant). I used schemas so that the shared app servers could
share a connection to the single database for all schemas. Things are
working fine.
and, while the application itself appears to be working fine, some queries involving system catalogs are very slow. Also,
psql's auto-completion is totally useless and \dt is very slow.In particular, I need to calculate the on-disk size of each schema using something like this:
SELECT sum(pg_total_relation_size(c.oid)) AS size
FROM pg_namespace n
JOIN pg_class c ON c.relnamespace = n.oid
WHERE n.nspname = 'abbiecarmer'
AND c.relkind = 'r';which is quite slow. Looking at the query plan, I see
```
Aggregate (cost=136903.16..136903.17 rows=1 width=4) (actual time=1024.420..1024.420 rows=1 loops=1)
-> Hash Join (cost=8.28..136902.86 rows=59 width=4) (actual time=143.247..1016.749 rows=60 loops=1)
Hash Cond: (c.relnamespace = n.oid)
-> Seq Scan on pg_class c (cost=0.00..133645.24 rows=866333 width=8) (actual time=0.045..943.029 rows=879788 loops=1) │
Filter: (relkind = 'r'::"char")
Rows Removed by Filter: 2610112
-> Hash (cost=8.27..8.27 rows=1 width=4) (actual time=0.032..0.032 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Index Scan using pg_namespace_nspname_index on pg_
Solution
The diagnosis wasn't so far off the point after all.
Obviously the catalogs are not well prepared to deal with thousands of schemas.
Unfortunately you are trying to use the views from the information schema, which can be excruciatingly slow. Those are complex views involving many tables to produce an exactly standard-compliant state. Just have a look at the output of
Use the catalog tables directly instead:
And, while there are no guarantees from the project, basic elements of
Try this query instead. Should be much faster out of the box:
Or maybe a bit faster:
-> SQLfiddle demo (including update)
If you are going to use indexes, make the first one on
Smaller, faster, less likely to cause problems, because the index itself is not included in the index.
Your second index in the question is probably a copy / paste artifact. You did not mention tablespaces playing a role, nor do your query or query plan show anything correxponding to it.
Would have to be:
However, I certainly won't claim to know nearly as much about the system catalog as Tom Lane does. If he says you'd be nuts to try this in a productive system, then you are nuts to do it anyway.
Then again, Tom wrote:
I still think you'd be nuts to try it on a production database, but ...
Bold emphasis mine. Which tells me, he is not completely against it, just not willing to guarantee for anything. Which makes it a lot less nutty.
I still wouldn't recommend it.
Obviously the catalogs are not well prepared to deal with thousands of schemas.
Unfortunately you are trying to use the views from the information schema, which can be excruciatingly slow. Those are complex views involving many tables to produce an exactly standard-compliant state. Just have a look at the output of
EXPLAIN ANALYZE or the graphic representation in pgAdmin to get an impression.Use the catalog tables directly instead:
And, while there are no guarantees from the project, basic elements of
pg_namespace or pg_class are not likely to change across major versions, either.Try this query instead. Should be much faster out of the box:
SELECT pg_total_relation_size(c.oid) AS size
FROM pg_namespace n
JOIN pg_class c ON c.relnamespace = n.oid
WHERE n.nspname = :schema_name
AND c.relkind = 'r';Or maybe a bit faster:
SELECT pg_total_relation_size(c.oid) AS size
FROM pg_class c
WHERE c.relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = :schema_name)
AND c.relkind = 'r';-> SQLfiddle demo (including update)
If you are going to use indexes, make the first one on
pg_class a partial index:CREATE INDEX pg_class_relnamespace_idx on pg_class(relnamespace)
WHERE relkind = 'r';Smaller, faster, less likely to cause problems, because the index itself is not included in the index.
Your second index in the question is probably a copy / paste artifact. You did not mention tablespaces playing a role, nor do your query or query plan show anything correxponding to it.
create index pg_class_reltablespace_index on pg_class(reltablespace);Would have to be:
CREATE INDEX pg_namespace_nspname_idx on pg_namespace(nspname);However, I certainly won't claim to know nearly as much about the system catalog as Tom Lane does. If he says you'd be nuts to try this in a productive system, then you are nuts to do it anyway.
Then again, Tom wrote:
I still think you'd be nuts to try it on a production database, but ...
Bold emphasis mine. Which tells me, he is not completely against it, just not willing to guarantee for anything. Which makes it a lot less nutty.
I still wouldn't recommend it.
Code Snippets
SELECT pg_total_relation_size(c.oid) AS size
FROM pg_namespace n
JOIN pg_class c ON c.relnamespace = n.oid
WHERE n.nspname = :schema_name
AND c.relkind = 'r';SELECT pg_total_relation_size(c.oid) AS size
FROM pg_class c
WHERE c.relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = :schema_name)
AND c.relkind = 'r';CREATE INDEX pg_class_relnamespace_idx on pg_class(relnamespace)
WHERE relkind = 'r';create index pg_class_reltablespace_index on pg_class(reltablespace);CREATE INDEX pg_namespace_nspname_idx on pg_namespace(nspname);Context
StackExchange Database Administrators Q#47119, answer score: 5
Revisions (0)
No revisions yet.