HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

MySQL Update a table based on the last data from another table while grouping with conditions

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
lasttheupdatewhilegroupingwithmysqlanotherbasedconditions

Problem

I need a query to UPDATE a series of fields based on another table

Table lp_plates_backup

|       uid | brand    | model    |
|         1 |          |          |
|         a | Old      | Error    |
|       ... | ...      | ...      |


Table lp_pictures_backup

| plate_uid | brand    | model    | date_validated      |
|         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | < Last one
|         1 | BMW      | 7-Series | 2014-08-04 11:21:18 |
|         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |
|         a | Tesla    | Model S  | 2014-12-17 11:00:00 | < Last one
|         a | BMW      | 3-Series | 2014-11-07 14:34:11 |


I need to update columns brand and model of lp_plates_backup based on the last date_validated entry of table lp_plates_backup while lp_plates_backup.uid match lp_pictures_backup.plate_uid for each lp_plates_backup.uid but only if:

  • lp_pictures_backup.brand is not empty



  • lp_plates_backup.brand is different from lp_pictures_backup.brand OR lp_plates_backup.model is different from lp_pictures_backup.model



The result should shown (lp_plates_backup) :

|       uid | brand    | model    |
|         1 | Fiat     | Panda    |
|         a | Tesla    | Model S  |


I have read these posts and made lot of tests but without getting the expected result:

  • https://stackoverflow.com/questions/1973246/how-to-specify-the-parent-query-field-from-within-a-subquery-in-mysql



  • https://stackoverflow.com/questions/17038193/select-row-with-most-recent-date-per-user



  • https://stackoverflow.com/questions/11156163/optimize-sub-query-selecting-last-record-of-each-group



Here is the last query I try:

```
UPDATE
lp_plates_backup AS t
LEFT JOIN (
SELECT
plate_uid, brand, model, date_validated
FROM
lp_pictures_backup as parent
WHERE
brand <> '' AND
date_validated = (SELECT MAX(date_validated)

Solution

The correct query is:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        SELECT
            plate_uid, brand, model, date_validated
        FROM
            lp_pictures_backup as parent
        WHERE
            brand <> '' AND
            date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY
                     plate_uid)
) AS m ON
    m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;


Just some little explanations. You need INNER JOIN because you must update row of lp_plates_backup only if plate_uid exists into lp_plates_backup. ORDER BY is useless because you are selecting all rows, order is not important.

You need the max date_validated of rows grouped by plate_uid, select MAX(date_validate) must individuate only a row for plate_uid through the date_validated field. So you need to add group by to select max(..).

The query select plate_uid returns a row for plate_uid so you don't need to aggregate here. The condition of single row is already builded into select max subquery.

I hope you did not get confused by me :-)

Updated

The previous works well if date_validate is the pair (plate_uid, date_validated) is unique.
If you have this kind of data:

| plate_uid | brand    | model    | date_validated      |
|         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | 
|         1 | BMW      | 7-Series | 2014-10-11 10:03:20 |   <- changed data
|         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |   
|         a | Tesla    | Model S  | 2014-12-17 11:00:00 | 
|         a | BMW      | 3-Series | 2014-11-07 14:34:11 |


The following query returns the first two rows for plate_uid 1.

SELECT plate_uid, brand, model, date_validated
FROM lp_pictures_backup as parent
WHERE
      brand <> '' AND
      date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY plate_uid)


The update will choice the values from the first or the second row. I think of the following alternatives:

  • using some other lp_pictures_backup fields to choise between rows with the same date_validated.



  • enforcing a unique constraint on table. I.e. alter table lp_pictures_backup add unique index (plate_id, date_validated). Rejecting invalid data.



  • detecting valid lp_pictures_backup pair of (plate_uid, date_validated). Updating lp_plates_backup only with valid pairs, review invalid pairs and correct them.



Maybe there are more alternatives. I prefer enforcing contraint on data so to have better data. I expand 3th alternative. Just create a view to define what a valid pair (plate_uid, date_validated) is:

CREATE VIEW lp_pictures_backup_valid as
SELECT plate_uid, date_validated
FROM lp_pictures_backup as parent
WHERE date_validated = (
        select max(date_validated) 
        from lp_pictures_backup t2 
        where t2.plate_uid =  parent.plate_uid GROUP BY plate_uid
      )
group by plate_uid, date_validated
having count(*) = 1;                    <-- you can change this to make an invalid row


A valid pair (plate_uid, date_validated) is a pair with the max date_validated only if there are a unique date_validated value.

I rewrite the update statements to consider only valid pair:

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        -- modification start
        SELECT p.plate_uid, p.brand, p.model, p.date_validated
        FROM lp_pictures_backup as p
             INNER JOIN lp_pictures_backup_valid valid 
                        ON p.plate_uid = valid.plate_uid and 
                           p.date_validated = valid.date_validated
        WHERE p.brand <> '')
        -- modification end
    ) 
    AS m ON m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;


Hope this make sense.

Update: 2014-03-20

In the first case:


Blockquote
1. using some other lp_pictures_backup fields to choise between rows with the same date_validated.

I have assumed your date is like this:

|id| plate_uid | brand    | model    | date_validated      |
|4 |         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | 
|3 |         1 | BMW      | 7-Series | 2014-10-11 10:03:20 |   
|2 |         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|1 |         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |    
|2 |         a | Tesla    | Model S  | 2014-12-17 11:00:00 | 
|1 |         a | BMW      | 3-Series | 2014-11-07 14:34:11 |


You can try this:

```
UPDATE
lp_plates_backup AS t
INNER JOIN (
SELECT t1.plate_uid, t1.brand, t1.model, t1.date_validated
FR

Code Snippets

UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        SELECT
            plate_uid, brand, model, date_validated
        FROM
            lp_pictures_backup as parent
        WHERE
            brand <> '' AND
            date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY
                     plate_uid)
) AS m ON
    m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;
| plate_uid | brand    | model    | date_validated      |
|         1 | Fiat     | Panda    | 2014-10-11 10:03:20 | 
|         1 | BMW      | 7-Series | 2014-10-11 10:03:20 |   <- changed data
|         1 | BMW      | 7-Series | 2014-07-28 19:14:02 |
|         1 | Mercedes | S-Class  | 2014-06-12 08:54:57 |   
|         a | Tesla    | Model S  | 2014-12-17 11:00:00 | 
|         a | BMW      | 3-Series | 2014-11-07 14:34:11 |
SELECT plate_uid, brand, model, date_validated
FROM lp_pictures_backup as parent
WHERE
      brand <> '' AND
      date_validated = (
                 SELECT MAX(date_validated)
                 FROM lp_pictures_backup as t2
                 WHERE t2.plate_uid = parent.plate_uid
                 GROUP BY plate_uid)
CREATE VIEW lp_pictures_backup_valid as
SELECT plate_uid, date_validated
FROM lp_pictures_backup as parent
WHERE date_validated = (
        select max(date_validated) 
        from lp_pictures_backup t2 
        where t2.plate_uid =  parent.plate_uid GROUP BY plate_uid
      )
group by plate_uid, date_validated
having count(*) = 1;                    <-- you can change this to make an invalid row
UPDATE
    lp_plates_backup AS t
    INNER JOIN  (
        -- modification start
        SELECT p.plate_uid, p.brand, p.model, p.date_validated
        FROM lp_pictures_backup as p
             INNER JOIN lp_pictures_backup_valid valid 
                        ON p.plate_uid = valid.plate_uid and 
                           p.date_validated = valid.date_validated
        WHERE p.brand <> '')
        -- modification end
    ) 
    AS m ON m.plate_uid = t.uid
SET
    t.brand = m.brand,
    t.model = m.model
WHERE
    t.brand <> m.brand
    OR
    t.model <> m.model;

Context

StackExchange Database Administrators Q#95532, answer score: 4

Revisions (0)

No revisions yet.