HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

UPDATE with join condition on matching words in columns of another table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
updateconditioncolumnswithtablewordsjoinanothermatching

Problem

I have 2 tables that looks like this:

Table A:

CREATE TEMP TABLE table_a (
  Column_1 text,
  ID_number int
);
INSERT INTO table_a VALUES
  ('foo,bar,baz', 123),
  ('qux,quux,quuz',456),
  ('corge,grault,garply',789),
  ('qux,bar,grault', 101);


Table B:

CREATE TEMP TABLE table_b (
  Column_1 text,
  Column_2 text,
  ID_number int
);
INSERT INTO table_b VALUES
  ('foo','baz',null),
  ('qux','quzz',null),
  ('corge','garply',null);


I'm trying to copy across values from the ID_number column in Table A, where the values in Column 1 & 2 of table B can be found in the same row of Column 1 in Table A.

This is the kind of thing I was thinking of:

UPDATE table_b AS B 
SET id_number = A.id_number 
FROM table_a AS A 
WHERE A.column_1 LIKE B.column_1
  AND A.column_1 LIKE B.column_2


.. but obviously this doesn't work.

How can I translate this into a proper query?

Additional info

table_a.Column_1 contains UK addresses, for example:

'47 BOWERS PLACE, GREAT YARMOUTH, NORFOLK, NR20 4AN'


In table_b I have the first line of the address in Column_1 (so, '47 BOWERS PLACE') and the postcode ('NR20 4AN') in Column_2.

I thought it would be best to simplify things, but maybe the actual data has some relevance in this situation.

table_a has about 30 million addresses. table_b has around 60k rows.

Performance is relevant, the faster this runs the better, and it will likely be repeated in the future.

Solution

The key here is that Column_1, represents three possible values for the JOIN. So what you want to use is string_to_array() (so long as those values are comma-separated and can not themselves include a comma).

Run this query,

SELECT id_number, string_to_array(column_1, ',') AS column_1
FROM table_a;
 id_number |       column_1        
-----------+-----------------------
       123 | {foo,bar,baz}
       456 | {qux,quux,quuz}
       789 | {corge,grault,garply}
       101 | {qux,bar,grault}


Now, we can run our UPDATE using = ANY(),

UPDATE table_b
SET id_number = A.id_number
FROM (
  SELECT id_number, string_to_array(column_1, ',') AS column_1
  FROM table_a
) AS A
WHERE table_b.column_1 = ANY(A.column_1)
  AND table_b.column_2 = ANY(A.column_1);


You can alternatively use <@

WHERE ARRAY[table_b.column_1, table_b.column_2] <@ A.column_1;


That even makes it a bit more compact..

UPDATE table_b
  SET id_number = A.id_number
FROM table_a AS A                              
  WHERE ARRAY[table_b.column_1, table_b.column_2] <@ string_to_array(A.column_1, ',');

Code Snippets

SELECT id_number, string_to_array(column_1, ',') AS column_1
FROM table_a;
 id_number |       column_1        
-----------+-----------------------
       123 | {foo,bar,baz}
       456 | {qux,quux,quuz}
       789 | {corge,grault,garply}
       101 | {qux,bar,grault}
UPDATE table_b
SET id_number = A.id_number
FROM (
  SELECT id_number, string_to_array(column_1, ',') AS column_1
  FROM table_a
) AS A
WHERE table_b.column_1 = ANY(A.column_1)
  AND table_b.column_2 = ANY(A.column_1);
WHERE ARRAY[table_b.column_1, table_b.column_2] <@ A.column_1;
UPDATE table_b
  SET id_number = A.id_number
FROM table_a AS A                              
  WHERE ARRAY[table_b.column_1, table_b.column_2] <@ string_to_array(A.column_1, ',');

Context

StackExchange Database Administrators Q#162449, answer score: 7

Revisions (0)

No revisions yet.