HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Poor performance of JOIN, WHERE, ORDER BY combination (MySQL)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
combinationorderwherejoinpoormysqlperformance

Problem

Goal: select 10 most recent rows from posts which are related to a specific user.

There are two tables: posts (~5,000,000 rows) and relations (~8,000 rows).

posts columns:

--------------------------------------------------------------------------------------------
|  id (int)  |  source_id (int)  |  title (varchar)  |  content (longtext)  |  date (int)  |
--------------------------------------------------------------------------------------------


relations columns:

----------------------------------------------------
|  id (int)  |  source_id (int)  |  user_id (int)  |
----------------------------------------------------


I tried using JOIN for retrieving 10 most recent rows from posts which are related to a specific user:

SELECT      p.id, p.title, p.content, r.id AS rid
FROM        posts AS p
JOIN        relations AS r
ON          r.source_id = p.source_id
WHERE       r.user_id = 1
ORDER BY    p.date DESC
LIMIT       10


However, it takes ~30 seconds to execute it (SDD hosting!). I tried lots of index combinations for both tables, including indexes for single and multiple columns – none of that interfered the time of execution anyhow. Is there any way to speed up the selection?

For user_id=1 there are ~1,000 source_id's and ~450,000 rows in posts.

SHOW CREATE TABLE... results:

``
CREATE TABLE
relations (
id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
user_id bigint(20) unsigned NOT NULL,
source_id bigint(20) unsigned DEFAULT NULL,
PRIMARY KEY (
id),
KEY
user_id (user_id),
KEY
source_id (source_id),
KEY
source_id_2 (source_id,user_id),
) ENGINE=InnoDB AUTO_INCREMENT=7692 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE
posts (
id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
source_id bigint(20) unsigned NOT NULL,
title varchar(512) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
content longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
date` int(10

Solution

With your datasets, MySQL has to obtain those 450,000 records from posts (in 1000 little chunks from each matching source_id), sort it, and then return the top 10. It is a costly exercise.

You could resort to using a stored procedure, and accumulate results going back in time, say daily or weekly, looping until obtaining at least 10 records, and then returning the 10 most recent ones. You'll need an index on posts by (date, source_id). It would return quickly for the most recently active users, but take much longer for users without recent posts. Something like the following:

DELIMITER ;;
CREATE DEFINER=CURRENT_USER PROCEDURE stored_procedure_name(u_id int)
BEGIN
  DECLARE fd DATE;
  DECLARE d DATE;

  SELECT MIN(date), MAX(date) INTO fd, d FROM posts;

  CREATE TEMPORARY TABLE last_posts (id int);

  WHILE d > fd AND (SELECT COUNT(*) FROM last_posts)  date_sub(d, interval 7 day) AND p.date <= d)
    WHERE r.user_id = u_id
    ORDER BY p.date DESC
    LIMIT 10;

    SET d = date_sub(d, interval 7 day);
  END WHILE;

  SELECT p.id, p.title, p.content, r.id AS rid
  FROM posts p
  JOIN relations AS r ON (r.source_id = p.source_id)
  WHERE p.id IN (SELECT * FROM last_posts)
  ORDER BY p.date DESC
  LIMIT 10;

  DROP TABLE last_posts;
END;;
DELIMITER ;


You could play with the intervals, reducing them to 1 day (more cycles with faster queries). Ensure you have an index on posts (date).

Code Snippets

DELIMITER ;;
CREATE DEFINER=CURRENT_USER PROCEDURE stored_procedure_name(u_id int)
BEGIN
  DECLARE fd DATE;
  DECLARE d DATE;

  SELECT MIN(date), MAX(date) INTO fd, d FROM posts;

  CREATE TEMPORARY TABLE last_posts (id int);

  WHILE d > fd AND (SELECT COUNT(*) FROM last_posts) < 10 DO
    INSERT INTO last_posts (id)
    SELECT p.id
    FROM relations AS r
    JOIN posts AS p ON (p.source_id = r.source_id AND
                        p.date > date_sub(d, interval 7 day) AND p.date <= d)
    WHERE r.user_id = u_id
    ORDER BY p.date DESC
    LIMIT 10;

    SET d = date_sub(d, interval 7 day);
  END WHILE;

  SELECT p.id, p.title, p.content, r.id AS rid
  FROM posts p
  JOIN relations AS r ON (r.source_id = p.source_id)
  WHERE p.id IN (SELECT * FROM last_posts)
  ORDER BY p.date DESC
  LIMIT 10;

  DROP TABLE last_posts;
END;;
DELIMITER ;

Context

StackExchange Database Administrators Q#137344, answer score: 2

Revisions (0)

No revisions yet.