HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Slow performance when joining a small table and filtering out on a non-key column in MariaDB (MySQL)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
keynoncolumnmariadbslowmysqlsmallperformancewhenand

Problem

I am fairly new to MariaDB and I am struggling with one issue that I cannot get to the bottom of it. This is the query:

SELECT SQL_NO_CACHE STRAIGHT_JOIN
    `c`.`Name` AS `CategoryName`, 
    `c`.`UrlSlug` AS `CategorySlug`, 
    `n`.`Description`, 
    IF(n.OriginalImageUrl IS NOT NULL, n.OriginalImageUrl, s.LogoUrl) AS `ImageUrl`, 
    `n`.`Link`, 
    `n`.`PublishedOn`, 
    `s`.`Name` AS `SourceName`, 
    `s`.`Url` AS `SourceWebsite`, 
   s.UrlSlug AS SourceUrlSlug,
    `n`.`Title`
FROM `NewsItems` AS `n`
INNER JOIN `NewsSources` AS `s` ON `n`.`NewsSourceId` = `s`.`Id`
LEFT JOIN `Categories` AS `c` ON `n`.`CategoryId` = `c`.`CategoryId`
WHERE s.UrlSlug = 'slug'
#WHERE s.Id = 52
ORDER BY `n`.`PublishedOn` DESC
LIMIT 50


NewsSources is a table with about 40 rows and NewsItems has ~1 million. Each news item belongs to one source and one source can have many items. I'm trying to get all items for a source identified by URL slug of the source.

-
In case when I use STRAIGHT_JOIN and when I query for a source that has lots of news items, the query returns immediately.
However, if I query for a source that has low number of items (~100) OR if I query for a URL slug that doesn't belong to any source (result set is 0 rows), the query runs for 12 seconds.

-
In case when I remove STRAIGHT_JOIN, I see the opposite performance from the first case - it runs really slow when I query for a news source with many items and returns immediately for sources with low number of items or result set is 0, because the URL slug doesn't belong to any news source.

-
In case when I query by news source ID (the commented out WHERE s.Id = 52), the result comes immediately, regardless of whether there are lots of items for that source or 0 items for that source.

I want to point out again that the NewsSources table contains only about 40 rows.

Here is the analyzer results for the query above: Explain Analyzer

What can I do to make this query to run fast always?

Here are

Solution

According to POINT - 3 :

In case when I query by news source ID (the commented out WHERE s.Id = 52), the result comes immediately, regardless of whether there are lots of items for that source or 0 items for that source.

This is possible because on using WHERE s.Id = 52 it using index from NewSources & NewITems table do check explain plan might be different then given.

Try Creating Below Index :

create index IDX_UrlSlug on NewsSources(UrlSlug);


And optimize all three tables :

OPTIMIZE TABLE NewsSources;
OPTIMIZE TABLE NewsItems;
OPTIMIZE TABLE Categories;

Code Snippets

create index IDX_UrlSlug on NewsSources(UrlSlug);
OPTIMIZE TABLE NewsSources;
OPTIMIZE TABLE NewsItems;
OPTIMIZE TABLE Categories;

Context

StackExchange Database Administrators Q#268532, answer score: 2

Revisions (0)

No revisions yet.