HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Query optimization with multi-column variant matching

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
variantmulticolumnwithqueryoptimizationmatching

Problem

TL;DR - I'm looking for advice on how to better write the query below.

Below is a pared down version of my table structure with some sample data. I don't have control over the data
structure at all so recommendations on schema changes unfortunately won't help me.

Problem

Given a building_level_key and a faction_key I need to return a record from building_levels joined to its
closest match from the building_culture_variants table.

For example, if I used goblin_walls & fact_blue I would expect the goblin_walls record that joins
building_culture_variant_key record 2.

An example structure of tables can be seen below:

-
factions - is a compacted version of the real table as cultures/subculture records are stored in different
tables but it gets the point across. This table is only really needed in the query so that the appropriate
culture/subculture can be referenced in relation to a given faction_key.

-
building_levels - acts as a base record for every building in the system. There is only one record per
building.

-
building_culture_variants - acts as its name implies; there can be more than one record for each building_level_key and each variant record is matched against a building level using the building_level_key and a combination of faction_key, culture_key and subculture_key.

How matching works

Matching starts with finding the given building_level_key in the culture variants table. This is a hard match and is needed to join any two building level and culture variant.

Each building level record will have at least one culture variant. Often there are several culture variants per building level but on average no more that 4. The most common culture variants is a "generic" one which means that the faction_key, culture_key and subculture_key columns are all null so the building will match against any faction. However any combination of the faction columns could have a key so I need to match a given faction again

Solution

The SQL code almost looks like you are trying to do things in a Procedural fashion. That won't be efficient on a Declarative language like SQL.

JOIN

When you JOIN two data sets and you need to make one behave as if NULL value is a wild card, you can do the JOIN like this

Select *
from TableA
  join TableB
    on TableA.col2match = coalesce( TableB.col2match, TableA.col2match )


However, this trick will only work if you know that TableA.col2match is always NOT NULL.

Score and Rank

You have already provided a scoring function to score the matches. I advise that you place this into a function for easier maintenance.

Most databases can RANK() your scores for you. This is an analytic function. You really should read up on them.

CTE

I'm using CTEs to help the reader understand what is going on.

If you put the WHERE clause within the CTE section, PostgreSQL will materialize the least number of rows (according the the dbfidle plan).

Resulting SQL

I built this SQL so that you can move the WHERE clause outside of the CTEs. By doing that, you can CREATE VIEW on the SQL. This will simplifying the SQL that the middle tier developer has to write.

with "building_factions" as (
  -- This is a subquery because here I would join a couple more tables
  -- to collect all of the faction info
  SELECT 
    "factions"."faction_key", 
    "factions"."culture_key", 
    "factions"."subculture_key"
  FROM 
    "factions"
  where "factions"."faction_key" = 'fact_blue'
) , "building_info" as (
  select
    "building_culture_variants"."building_culture_variant_key", 
    "building_levels"."building_level_key", 
    "building_levels"."create_time", 
    "building_levels"."create_cost", 
    "building_culture_variants"."name",
    "building_culture_variants"."faction_key", 
    "building_culture_variants"."culture_key", 
    "building_culture_variants"."subculture_key"
  from "building_levels" 
    join "building_culture_variants"
      on "building_levels"."building_level_key" = "building_culture_variants"."building_level_key"
  where "building_levels"."building_level_key" = 'goblin_walls'
), "scoreRanked_data" as (
  select 
    "building_factions"."faction_key", 
    "building_factions"."culture_key", 
    "building_factions"."subculture_key",
    "building_info"."building_culture_variant_key", 
    "building_info"."building_level_key", 
    "building_info"."create_time", 
    "building_info"."create_cost", 
    "building_info"."name",
    rank() over (partition by "building_factions"."faction_key", 
                              "building_factions"."culture_key", 
                              "building_factions"."subculture_key",
                              "building_info"."building_level_key"
                 order by (
      CASE WHEN "building_info"."faction_key" = "building_factions"."faction_key" THEN 1
           WHEN "building_info"."faction_key" IS NULL THEN 0
           ELSE NULL
      END + 
      CASE WHEN "building_info"."culture_key" = "building_factions"."culture_key" THEN 1
           WHEN "building_info"."culture_key" IS NULL THEN 0
           ELSE NULL
      END + 
      CASE WHEN "building_info"."subculture_key" = "building_factions"."subculture_key" THEN 1
           WHEN "building_info"."subculture_key" IS NULL THEN 0
           ELSE NULL
      END
      ) desc nulls last ) match_rank
  from "building_factions"
    join "building_info"
      on    "building_factions"."faction_key" = coalesce( "building_info"."faction_key", "building_factions"."faction_key")
        and "building_factions"."culture_key" = coalesce( "building_info"."culture_key", "building_factions"."culture_key")
        and "building_factions"."subculture_key" = coalesce( "building_info"."subculture_key", "building_factions"."subculture_key")
)
select *
from "scoreRanked_data"
where match_rank = 1
limit 1

Code Snippets

Select *
from TableA
  join TableB
    on TableA.col2match = coalesce( TableB.col2match, TableA.col2match )
with "building_factions" as (
  -- This is a subquery because here I would join a couple more tables
  -- to collect all of the faction info
  SELECT 
    "factions"."faction_key", 
    "factions"."culture_key", 
    "factions"."subculture_key"
  FROM 
    "factions"
  where "factions"."faction_key" = 'fact_blue'
) , "building_info" as (
  select
    "building_culture_variants"."building_culture_variant_key", 
    "building_levels"."building_level_key", 
    "building_levels"."create_time", 
    "building_levels"."create_cost", 
    "building_culture_variants"."name",
    "building_culture_variants"."faction_key", 
    "building_culture_variants"."culture_key", 
    "building_culture_variants"."subculture_key"
  from "building_levels" 
    join "building_culture_variants"
      on "building_levels"."building_level_key" = "building_culture_variants"."building_level_key"
  where "building_levels"."building_level_key" = 'goblin_walls'
), "scoreRanked_data" as (
  select 
    "building_factions"."faction_key", 
    "building_factions"."culture_key", 
    "building_factions"."subculture_key",
    "building_info"."building_culture_variant_key", 
    "building_info"."building_level_key", 
    "building_info"."create_time", 
    "building_info"."create_cost", 
    "building_info"."name",
    rank() over (partition by "building_factions"."faction_key", 
                              "building_factions"."culture_key", 
                              "building_factions"."subculture_key",
                              "building_info"."building_level_key"
                 order by (
      CASE WHEN "building_info"."faction_key" = "building_factions"."faction_key" THEN 1
           WHEN "building_info"."faction_key" IS NULL THEN 0
           ELSE NULL
      END + 
      CASE WHEN "building_info"."culture_key" = "building_factions"."culture_key" THEN 1
           WHEN "building_info"."culture_key" IS NULL THEN 0
           ELSE NULL
      END + 
      CASE WHEN "building_info"."subculture_key" = "building_factions"."subculture_key" THEN 1
           WHEN "building_info"."subculture_key" IS NULL THEN 0
           ELSE NULL
      END
      ) desc nulls last ) match_rank
  from "building_factions"
    join "building_info"
      on    "building_factions"."faction_key" = coalesce( "building_info"."faction_key", "building_factions"."faction_key")
        and "building_factions"."culture_key" = coalesce( "building_info"."culture_key", "building_factions"."culture_key")
        and "building_factions"."subculture_key" = coalesce( "building_info"."subculture_key", "building_factions"."subculture_key")
)
select *
from "scoreRanked_data"
where match_rank = 1
limit 1

Context

StackExchange Database Administrators Q#215812, answer score: 3

Revisions (0)

No revisions yet.