HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Running out of disk space running MySQL query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
spacediskquerymysqlrunningout

Problem

I have been trying to create a "gaps and islands" identification routine in MySQL. See the previous question regarding setup here and computational resources required here.

I increased the RAM of the database server to 4 Gb. The query was running ok, but now I am having an issue where the database server runs out of disk space while running the query.

The query looks at the entire database and inserts islands that it finds into a table called shutdown_events. I use INSERT IGNORE and look at the whole database so that I don't cut off islands that span some sort of a timestamp constraint (I try adding this in, read on..).

I created the view with the following SQL (there are ~30 dr.*_sd values to include)

create view sd_pivot as
  select dr.wellsite_id, 'EngOilP_sd' as sd, dr.timestamp, dr.EngOilP_sd as val from datarecords dr
  union all
  select dr.wellsite_id, 'Stg1ScrbLVL_sd', dr.timestamp, dr.Stg1ScrbLVL_sd from datarecords dr
  union all
  select dr.wellsite_id, 'Stg2ScrbLVL_sd', dr.timestamp, dr.Stg2ScrbLVL_sd from datarecords dr
  union all
...


and then the SQL that populates the shutdownevents table is here:

INSERT IGNORE INTO shutdownevents (wellsite_id, sd_name, start, end)
SELECT t.*
FROM (
  SELECT wellsite_id, sd, MIN(timestamp) AS starttime, MAX(timestamp) AS endtime
  FROM (
         SELECT
           sd_p.*
           , @val_change := IF(@prev_val != sd_p.val, @val_change + 1, @val_change) AS vc
           , @prev_val := sd_p.val
         FROM
           sd_pivot sd_p
           , (SELECT @prev_val := NULL, @val_change := 0) var_init_subquery
         ORDER BY wellsite_id, sd, timestamp
       ) sq
  WHERE val = 1
  # AND timestamp > '{two_weeks_ago}'
  GROUP BY sd, vc
) t
ORDER BY wellsite_id, sd, starttime


In my testing database, I added the line (commented out in the above SQL) AND timestamp > '{two_weeks_ago}' to only find islands that occur after that date. Note that I sacrifice the beginning of islands that span

Solution

Your problem is due to 2 reported bugs:

  • When using UNION or UNION ALL MySQL creates a temporary working table. While it is justifiable for UNION, it is unnecessary for UNION ALL




mysqld uses a temp table for both UNION and UNION ALL processing. The
temp table for UNION ALL is not needed as results can be immediately
returned to the client. That change would save the cost of writing the
temp table and possibly spilling it to disk. It would also return the
first-N rows to the client much faster.

Bug #50674: Do not create temporary tables for UNION ALL

This bug was scheduled to be fixed in ver. 5.7.3.

  • Furthermore, self-join tables are reopened for every call:




In self joins, unions, etc. - tables are re-opened for each mention.


If a UNION (or self-joins) have e.g. 1000 mentions of same table, then
1000 internal structures will be created, with additional file
descriptors and buffers per each.


This leads to a simple 100k-sized query to allocate 500MB of memory
just by mentioning same (even empty) table in it (e.g. SELECT * FROM
table UNION SELECT * FROM table ...).

Bug #44626: Tables are reopened multiple times if used in same query, buffers allocated too

So in your case it creates 38 temporary tables consuming lots of resources.

One possible solution

is to use dynamic SQL and CURSOR to go through every of 38 columns. This way you don't need a view with 38 UNION ALL statements.

Context

StackExchange Database Administrators Q#122499, answer score: 3

Revisions (0)

No revisions yet.