HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

"ORDER BY column=value" using indexes (no full table scan)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
orderfullscancolumnindexesvalueusingtable

Problem

I'm trying to hack a FLOSS application called Phabricator / Phorge

Let's take this simple MySQL table storing some questions by ID and their status (open, closed, invalid etc.):

CREATE TABLE `ponder_question` (
  `id`         int(10) unsigned NOT NULL AUTO_INCREMENT,
  `status` varchar(32)          NOT NULL,
  PRIMARY KEY   (`id`),
  KEY `status`  (`status`),


I want to order by a specific status first, then the others. So:

SELECT * FROM ponder_question
  ORDER BY status='open' DESC
  LIMIT 5


It works but consider this DESCRIBE. That query is apparently examining 5000 rows which is probably too much / it is doing a full table scan:

id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra

1
SIMPLE
ponder_question
NULL
index
NULL
status
130
NULL
5000
100.00
Using index; Using filesort

(Instead, if you EXPLAIN a simple ORDER BY status DESC LIMIT 5 it examines just 5 rows)

To reduce the number of examined rows and avoid that Using filesort I also tried a FORCE INDEX:

SELECT * FROM ponder_question
  FORCE INDEX (status)
  ORDER BY status='open' DESC
  LIMIT 5


I tested in MySQL 5.7 and MariaDB 10.3. Anyway, my question is not version-specific.

I guess I need to avoid this approach.

I maybe need to change the schema to have a simpler ordering clause, but maybe not.

To be honest, the final goal is to ORDER BY status='open' DESC, id, so showing all open questions by creation, and then all closed questions by creation.

I think this approach is bad, but I was just trying to be nice with this application, without proposing a schema change. So feel free to tell me that my question is stupid:

Question: how do you implement an order by a value first, in a more efficient way?

Some related similar approaches that apparently do not describe this problem:

  • How does ORDER BY FIELD() in MySQL work internally

Solution

"Using index" means that the index it used is "covering". That means that all the columns needed (status and id) are in the one index. The index is declared INDEX(status), but implicitly the PRIMARY KEY is tacked on.

Virtually any function, and in many cases any "operator" (eg '=') makes the ORDER BY not sargable . So it cannot use the index to stop after the given LIMIT. However, because of "covering", it does use the index's BTree instead of the data's BTree.

For a small table (under, say, 1K rows), I would not worry about performance.

If the table is big enough (over, say, 1M rows), then there is a more complex way to efficiently do what you asked for:

( SELECT  1 AS seq,
          *
      FROM ponder_question
      WHERE status >= 'open'
      LIMIT 5
) UNION ALL
( SELECT  2 AS seq,
          *
      FROM ponder_question
      WHERE status < 'open'
      LIMIT 5
) ORDER BY seq
  LIMIT 5   -- Yet again


Each of the SELECTs will use INDEX(status), but stop after 5 rows.

The UNION will have up to 10 rows, which is then trivially sorted.

Code Snippets

( SELECT  1 AS seq,
          *
      FROM ponder_question
      WHERE status >= 'open'
      LIMIT 5
) UNION ALL
( SELECT  2 AS seq,
          *
      FROM ponder_question
      WHERE status < 'open'
      LIMIT 5
) ORDER BY seq
  LIMIT 5   -- Yet again

Context

StackExchange Database Administrators Q#323345, answer score: 5

Revisions (0)

No revisions yet.