patternsqlMinor
Query optimization with multi-column variant matching
Viewed 0 times
variantmulticolumnwithqueryoptimizationmatching
Problem
TL;DR - I'm looking for advice on how to better write the query below.
Below is a pared down version of my table structure with some sample data. I don't have control over the data
structure at all so recommendations on schema changes unfortunately won't help me.
Problem
Given a
closest match from the
For example, if I used
An example structure of tables can be seen below:
-
tables but it gets the point across. This table is only really needed in the query so that the appropriate
culture/subculture can be referenced in relation to a given
-
building.
-
How matching works
Matching starts with finding the given
Each building level record will have at least one culture variant. Often there are several culture variants per building level but on average no more that 4. The most common culture variants is a "generic" one which means that the
Below is a pared down version of my table structure with some sample data. I don't have control over the data
structure at all so recommendations on schema changes unfortunately won't help me.
Problem
Given a
building_level_key and a faction_key I need to return a record from building_levels joined to itsclosest match from the
building_culture_variants table. For example, if I used
goblin_walls & fact_blue I would expect the goblin_walls record that joinsbuilding_culture_variant_key record 2.An example structure of tables can be seen below:
-
factions - is a compacted version of the real table as cultures/subculture records are stored in differenttables but it gets the point across. This table is only really needed in the query so that the appropriate
culture/subculture can be referenced in relation to a given
faction_key.-
building_levels - acts as a base record for every building in the system. There is only one record perbuilding.
-
building_culture_variants - acts as its name implies; there can be more than one record for each building_level_key and each variant record is matched against a building level using the building_level_key and a combination of faction_key, culture_key and subculture_key.How matching works
Matching starts with finding the given
building_level_key in the culture variants table. This is a hard match and is needed to join any two building level and culture variant.Each building level record will have at least one culture variant. Often there are several culture variants per building level but on average no more that 4. The most common culture variants is a "generic" one which means that the
faction_key, culture_key and subculture_key columns are all null so the building will match against any faction. However any combination of the faction columns could have a key so I need to match a given faction againSolution
The SQL code almost looks like you are trying to do things in a Procedural fashion. That won't be efficient on a Declarative language like SQL.
JOIN
When you
However, this trick will only work if you know that
Score and Rank
You have already provided a scoring function to score the matches. I advise that you place this into a function for easier maintenance.
Most databases can
CTE
I'm using CTEs to help the reader understand what is going on.
If you put the
Resulting SQL
I built this SQL so that you can move the
JOIN
When you
JOIN two data sets and you need to make one behave as if NULL value is a wild card, you can do the JOIN like thisSelect *
from TableA
join TableB
on TableA.col2match = coalesce( TableB.col2match, TableA.col2match )However, this trick will only work if you know that
TableA.col2match is always NOT NULL.Score and Rank
You have already provided a scoring function to score the matches. I advise that you place this into a function for easier maintenance.
Most databases can
RANK() your scores for you. This is an analytic function. You really should read up on them.CTE
I'm using CTEs to help the reader understand what is going on.
If you put the
WHERE clause within the CTE section, PostgreSQL will materialize the least number of rows (according the the dbfidle plan).Resulting SQL
I built this SQL so that you can move the
WHERE clause outside of the CTEs. By doing that, you can CREATE VIEW on the SQL. This will simplifying the SQL that the middle tier developer has to write.with "building_factions" as (
-- This is a subquery because here I would join a couple more tables
-- to collect all of the faction info
SELECT
"factions"."faction_key",
"factions"."culture_key",
"factions"."subculture_key"
FROM
"factions"
where "factions"."faction_key" = 'fact_blue'
) , "building_info" as (
select
"building_culture_variants"."building_culture_variant_key",
"building_levels"."building_level_key",
"building_levels"."create_time",
"building_levels"."create_cost",
"building_culture_variants"."name",
"building_culture_variants"."faction_key",
"building_culture_variants"."culture_key",
"building_culture_variants"."subculture_key"
from "building_levels"
join "building_culture_variants"
on "building_levels"."building_level_key" = "building_culture_variants"."building_level_key"
where "building_levels"."building_level_key" = 'goblin_walls'
), "scoreRanked_data" as (
select
"building_factions"."faction_key",
"building_factions"."culture_key",
"building_factions"."subculture_key",
"building_info"."building_culture_variant_key",
"building_info"."building_level_key",
"building_info"."create_time",
"building_info"."create_cost",
"building_info"."name",
rank() over (partition by "building_factions"."faction_key",
"building_factions"."culture_key",
"building_factions"."subculture_key",
"building_info"."building_level_key"
order by (
CASE WHEN "building_info"."faction_key" = "building_factions"."faction_key" THEN 1
WHEN "building_info"."faction_key" IS NULL THEN 0
ELSE NULL
END +
CASE WHEN "building_info"."culture_key" = "building_factions"."culture_key" THEN 1
WHEN "building_info"."culture_key" IS NULL THEN 0
ELSE NULL
END +
CASE WHEN "building_info"."subculture_key" = "building_factions"."subculture_key" THEN 1
WHEN "building_info"."subculture_key" IS NULL THEN 0
ELSE NULL
END
) desc nulls last ) match_rank
from "building_factions"
join "building_info"
on "building_factions"."faction_key" = coalesce( "building_info"."faction_key", "building_factions"."faction_key")
and "building_factions"."culture_key" = coalesce( "building_info"."culture_key", "building_factions"."culture_key")
and "building_factions"."subculture_key" = coalesce( "building_info"."subculture_key", "building_factions"."subculture_key")
)
select *
from "scoreRanked_data"
where match_rank = 1
limit 1Code Snippets
Select *
from TableA
join TableB
on TableA.col2match = coalesce( TableB.col2match, TableA.col2match )with "building_factions" as (
-- This is a subquery because here I would join a couple more tables
-- to collect all of the faction info
SELECT
"factions"."faction_key",
"factions"."culture_key",
"factions"."subculture_key"
FROM
"factions"
where "factions"."faction_key" = 'fact_blue'
) , "building_info" as (
select
"building_culture_variants"."building_culture_variant_key",
"building_levels"."building_level_key",
"building_levels"."create_time",
"building_levels"."create_cost",
"building_culture_variants"."name",
"building_culture_variants"."faction_key",
"building_culture_variants"."culture_key",
"building_culture_variants"."subculture_key"
from "building_levels"
join "building_culture_variants"
on "building_levels"."building_level_key" = "building_culture_variants"."building_level_key"
where "building_levels"."building_level_key" = 'goblin_walls'
), "scoreRanked_data" as (
select
"building_factions"."faction_key",
"building_factions"."culture_key",
"building_factions"."subculture_key",
"building_info"."building_culture_variant_key",
"building_info"."building_level_key",
"building_info"."create_time",
"building_info"."create_cost",
"building_info"."name",
rank() over (partition by "building_factions"."faction_key",
"building_factions"."culture_key",
"building_factions"."subculture_key",
"building_info"."building_level_key"
order by (
CASE WHEN "building_info"."faction_key" = "building_factions"."faction_key" THEN 1
WHEN "building_info"."faction_key" IS NULL THEN 0
ELSE NULL
END +
CASE WHEN "building_info"."culture_key" = "building_factions"."culture_key" THEN 1
WHEN "building_info"."culture_key" IS NULL THEN 0
ELSE NULL
END +
CASE WHEN "building_info"."subculture_key" = "building_factions"."subculture_key" THEN 1
WHEN "building_info"."subculture_key" IS NULL THEN 0
ELSE NULL
END
) desc nulls last ) match_rank
from "building_factions"
join "building_info"
on "building_factions"."faction_key" = coalesce( "building_info"."faction_key", "building_factions"."faction_key")
and "building_factions"."culture_key" = coalesce( "building_info"."culture_key", "building_factions"."culture_key")
and "building_factions"."subculture_key" = coalesce( "building_info"."subculture_key", "building_factions"."subculture_key")
)
select *
from "scoreRanked_data"
where match_rank = 1
limit 1Context
StackExchange Database Administrators Q#215812, answer score: 3
Revisions (0)
No revisions yet.