HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Replace value based on previous row / next row depending on conditions

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
previousreplacevaluenextbasedconditionsdependingrow

Problem

I have a price table, contain 2.9 million rows of price figures belong to a total of 941 of company_id for three months. 0.00 represents missing data. As an example:

+------------+---------------------+------------+
   | company_id | date_time           | price_open |
   +------------+---------------------+------------+
   |      4     | 13/12/2014 16:23:00 |    56.30   |
   |      4     | 13/12/2014 16:24:00 |    56.25   |
   |      4     | 13/12/2014 16:25:00 |    56.25   | <--- last price of `company_id` = `4`
   |      5     |  22/9/2014 08:00:00 |     0.00   | <--- first price of `company_id` = `5`
   |      5     |  22/9/2014 08:01:00 |     0.00   |
   |      5     |  22/9/2014 08:02:00 |     5.45   |
   |      5     |  22/9/2014 08:03:00 |     5.25   |
   |      5     |  22/9/2014 08:04:00 |     0.00   |
   |      5     |  22/9/2014 08:05:00 |     0.00   |
   +------------+---------------------+------------+


May I know if I can achieve the below by entirely using MySQL command? If yes, any guidance/help would be much appreciated.

  • Loop through each company_id and sort price_open by its


date_time, then

  • if found any price_open = 0.00,


replace this/these 0.00 value(s) with previous known price_open
figure, but

  • if the price_open = 0.00 is in the very


first row or first few rows of a company_id, do not replace with
previous known price_open figure. Instead, use the next known
price_open figure - because it shouldn't mix up with other
company_id.

Expected result will be as follow:

``
+------------+---------------------+------------+
| company_id | date_time | price_open |
+------------+---------------------+------------+
| 4 | 13/12/2014 16:23:00 | 56.30 |
| 4 | 13/12/2014 16:24:00 | 56.25 |
| 4 | 13/12/2014 16:25:00 | 56.25 | <--- last price of
company_id = 4
| 5 | 22/9/2014 08:00:00 | 5.45 | <--- first price of
company_id

Solution

Add this composite index if you don't already have it -- it should probably be the PRIMARY KEY.

INDEX(company_id, date_time)


Run this a few times (until it stops modifying any rows)

UPDATE    tbl a
    JOIN  tbl b
       ON  b.company_id = a.company_id
      AND  b.date_time  = a.date_time + INTERVAL 1 MINUTE
    SET a.price_open = b.price_open
    WHERE  a.price_open  = 0.00
      AND  b.price_open != 0.00;


In case I am wrong, suggest you copy the table over for testing:

CREATE TABLE tbl LIKE real_tbl;
ALTER TABLE tbl DROP PRIMARY KEY,
                ADD  PRIMARY KEY(company_id, date_time);  -- if needed
INSERT INTO tbl SELECT * FROM real_tbl;
run the update a few times
check the values
if satisfied, ...
RENAME TABLE real_tbl TO old, tbl TO real_tbl;
DROP TABLE old;


On the presumption that your SELECTs will be company-centric, it is better to have PRIMARY KEY(co, dt) than (dt, co).

Code Snippets

INDEX(company_id, date_time)
UPDATE    tbl a
    JOIN  tbl b
       ON  b.company_id = a.company_id
      AND  b.date_time  = a.date_time + INTERVAL 1 MINUTE
    SET a.price_open = b.price_open
    WHERE  a.price_open  = 0.00
      AND  b.price_open != 0.00;
CREATE TABLE tbl LIKE real_tbl;
ALTER TABLE tbl DROP PRIMARY KEY,
                ADD  PRIMARY KEY(company_id, date_time);  -- if needed
INSERT INTO tbl SELECT * FROM real_tbl;
run the update a few times
check the values
if satisfied, ...
RENAME TABLE real_tbl TO old, tbl TO real_tbl;
DROP TABLE old;

Context

StackExchange Database Administrators Q#129255, answer score: 2

Revisions (0)

No revisions yet.