patternsqlMinor
An alternative to querying with multiple JOIN over the same table?
Viewed 0 times
samethealternativewithjoinqueryingmultipleovertable
Problem
I have a Postgresql 11 database.
Let's say I have a table called houses. It should have hundreds of thousands of records.
Now, my houses have features I want to register in the database. As the list of possible features will be quite long (several dozens) and will evolve over time, as I don't want add a long list of columns to the table houses and change the table constantly with 'ALTER TABLE', I thought of having a separate table for these features :
In average, each house record will have 10-20 records in the house_features table.
So far, this seems a simple efficient model : I can add as many different features, controlling the possible values of feature_name and feature_value in the upper layers (the applicative layer and/or the GUI). I don't have to alter the database each time the application evolves and I need a new type of feature.
For the example, let's say I have the following features :
Obviously, storing booleans, integers and floats as strings is not very efficient an
Let's say I have a table called houses. It should have hundreds of thousands of records.
CREATE TABLE houses (
pkid serial primary key,
address varchar(255) NOT NULL,
rent float NOT NULL
);Now, my houses have features I want to register in the database. As the list of possible features will be quite long (several dozens) and will evolve over time, as I don't want add a long list of columns to the table houses and change the table constantly with 'ALTER TABLE', I thought of having a separate table for these features :
CREATE TABLE house_features (
pkid serial primary key,
house_pkid integer NOT NULL,
feature_name varchar(255) NOT NULL,
feature_value varchar(255)
);
CREATE INDEX ON house_features (feature_name, feature_value);
ALTER TABLE house_features ADD CONSTRAINT features_fk FOREIGN KEY (house_pkid) REFERENCES houses (pkid) ON DELETE CASCADE;In average, each house record will have 10-20 records in the house_features table.
So far, this seems a simple efficient model : I can add as many different features, controlling the possible values of feature_name and feature_value in the upper layers (the applicative layer and/or the GUI). I don't have to alter the database each time the application evolves and I need a new type of feature.
For the example, let's say I have the following features :
- feature_name : 'rooftype' with possible feature_value : 'flat' or 'inclined'
- feature_name : 'wallcolors' with possible feature_value : 'white', 'beige', 'blue', 'green', etc.. (15 different possible values)
- feature_name : 'has_basement' with possible feature_value : 'True' or 'False'.
- feature_name : 'number_of_doors' with possible feature_value any integer coded as a string (so '0', '1', '2', ...).
- feature_name : 'floor_surface' with possible feature_value any given float coded as a string (e.g.: '155.2')
Obviously, storing booleans, integers and floats as strings is not very efficient an
Solution
You could try to aggregate the features into a JSON value, then searching for a combination of multiple features is quite easy:
Performance can be improved by adding a WHERE clause to the sub-select which repeats the feature names, e.g:
or even
The outer condition is still necessary, because the inner
This also has the advantage (in my eyes) that you only get one row with all the features, rather then one row for each feature.
Unless you remove, add and change features for a house very frequently, storing them as a single JSONB column on the
select h.*, hf.features
from houses
join (
select house_id, jsonb_object_agg(feature_name, feature_value) as features
from house_features
group by house_id
) hf on hf.house_pkid = h.pkid
where hf.features @> '{"rooftype": "flat", "has_basement", "true", "wallcolors": "white"}';Performance can be improved by adding a WHERE clause to the sub-select which repeats the feature names, e.g:
where feature_name in ('rooftype', 'has_basement', 'wallcolors')or even
where (feature_name, feature_value) in (('rooftype', 'flat') ('has_basement', 'true'), ('wallcolors', 'white'))The outer condition is still necessary, because the inner
where will include houses that don't have all the features. This also has the advantage (in my eyes) that you only get one row with all the features, rather then one row for each feature.
Unless you remove, add and change features for a house very frequently, storing them as a single JSONB column on the
house table (features) and getting rid of the house_features table, might be an alternative. In that case you could create an index on the column to speed up the search.Code Snippets
select h.*, hf.features
from houses
join (
select house_id, jsonb_object_agg(feature_name, feature_value) as features
from house_features
group by house_id
) hf on hf.house_pkid = h.pkid
where hf.features @> '{"rooftype": "flat", "has_basement", "true", "wallcolors": "white"}';where feature_name in ('rooftype', 'has_basement', 'wallcolors')where (feature_name, feature_value) in (('rooftype', 'flat') ('has_basement', 'true'), ('wallcolors', 'white'))Context
StackExchange Database Administrators Q#267840, answer score: 6
Revisions (0)
No revisions yet.