HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

MySQL: Optimize UNION with "ORDER BY" in inner queries

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
orderwithunionmysqloptimizequeriesinner

Problem

I just set up a logging system which consists of multiple tables with the same layout.

There is one table for each data source.

For the log viewer, I want to

  • UNION all the log tables,



  • filter them by account,



  • add a pseudo column for identification of the source,



  • sort them by time,



  • and limit them for pagination.



All tables contain a field called zeitpunkt that is an indexed date/time column.

My first attempt was:

(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt AS zeit,
 'hp' AS source FROM is_log AS l WHERE l.account_id = 730)

UNION

(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt,
 'ig' AS source FROM ig_is_log AS l WHERE l.account_id = 730)

ORDER BY zeit DESC LIMIT 10;


The optimizer cannot use the indexes here because all rows from both tables are returned by the subqueries and sorted after the UNION.

My workaround was the following:

(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt AS zeit,
 'hp' AS source FROM is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10)

UNION

(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt,
 'ig' AS source FROM ig_is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10)

ORDER BY zeit DESC LIMIT 10;


I was expecting the query engine would use the indexes here since both subqueries should be sorted and limited already prior to the UNION, which then merges and sorts the rows.

I really thought this would be it, but running EXPLAIN on the query tells me the subqueries still search both tables.

EXPLAINing the subqueries themselves shows me the desired optimization but UNIONing them together it does not.

Did I miss something?

I know that ORDER BY clauses inside UNION subqueries are ignored without a LIMIT, but there is a limit.

Edit:

Actually, there probably will also be queries without the account_id condition.

The tables already exist and are filled with data. There

Solution

Just out of curiosity, can you try this version? It may trick the optimizer to use the same indices that the subqueries would use separately:

SELECT *
FROM
(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt AS zeit,
 'hp' AS source FROM is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10) 
    AS a

UNION ALL

SELECT *
FROM
(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt,
 'ig' AS source FROM ig_is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10)
    AS b

ORDER BY zeit DESC LIMIT 10;


I still think that the best index you could have is the compound (account_id, zeitpunkt). It would yield the 10 rows fast, and no tricks would ne needed.

Code Snippets

SELECT *
FROM
(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt AS zeit,
 'hp' AS source FROM is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10) 
    AS a

UNION ALL

SELECT *
FROM
(SELECT l.id, l.account_id, l.vnum, l.count, l.preis, l.zeitpunkt,
 'ig' AS source FROM ig_is_log AS l WHERE l.account_id = 730
 ORDER BY l.zeitpunkt DESC LIMIT 10)
    AS b

ORDER BY zeit DESC LIMIT 10;

Context

StackExchange Database Administrators Q#24780, answer score: 10

Revisions (0)

No revisions yet.