HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Searching text, starring, flagging and joins

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
flaggingstarringsearchingtextandjoins

Problem

I have an interesting performance concern that I would like to address before it becomes a serious issue. I created a SQL Fiddle to demonstrate the query, and the explain statement can be seen on Depesz.

I have a collection phrases in different languages, and 'translations' that link phrase ids together (source and destination). The phrases can be searched by keyword in a particular source language, and find translations of that phrase in the destination language. The translations can also be starred and flagged, and the user id and translation id are held in respective tables for this functionality.

If a keyword and source language matches, those phrase ids are joined on the translations table; then, that set is joined on new_phrases again where the destination language matches. That set is also joined on favorite_translations and translation_flags, and if the user id matches in those sets, favorited and flagged are true, otherwise false.

```
SELECT P1.phrase_id as src_phrase_id,
P1.author as src_author,
P1.language as src_language,
P1.text as src_text,
P2.phrase_id as dst_phrase_id,
P2.author as dst_author,
P2.language as dst_language,
P2.text as dst_text,
P1.translation_id,
P1.community_rating,
CASE WHEN P1.favoriter_id = $1 THEN true
ELSE false
END
as favorited,
CASE WHEN P1.flagger_id = $1 THEN true
ELSE false
END
as flagged
FROM (
SELECT P.phrase_id, P.author, P.language, P.text,
T.translation_id, T.destination_id, T.community_rating,
FT.user_id as favoriter_id, TF.user_id as flagger_id
FROM new_phrases P
INNER JOIN translations T ON P.phrase_id = T.source_id
LEFT JOIN favorite_translations FT ON T.translation_id = FT.translation_id
LEFT JOIN translation_flags TF ON T.translation_id = TF.translation_id
WHERE P.tsv_

Solution

The way you join is the problem.

Okay now that's out the way let me explain. These left joins you're doing are bloating your result set:

LEFT JOIN favorite_translations FT ON T.translation_id = FT.translation_id



The LEFT JOIN keyword returns all rows from the left table (table1),
with the matching rows in the right table (table2). The result is NULL
in the right side when there is no match. - (w3schools)

what this means is that for two favorites on the same post you get two results into your set. This bloats your result set significantly as soon as you get more and more flags and favorites.

There is a rather simple fix for that.

Since in the end you only want to know about wether the user favorited / flagged or not, the "correct" way to go about this is a subselect to the count, something like this:

((SELECT COUNT(*) FROM favorite_translations 
         WHERE user_id = $1 AND translation_id = T.translation_id) > 0) AS favorited


This also eliminates the less than nice-looking Case-Statement you got there, removes two joins and nets us a smaller execution plan, that even prevents multiple results for the same translation pair.

So this is the new execution plan on depesz

Code Snippets

LEFT JOIN favorite_translations FT ON T.translation_id = FT.translation_id
((SELECT COUNT(*) FROM favorite_translations 
         WHERE user_id = $1 AND translation_id = T.translation_id) > 0) AS favorited

Context

StackExchange Code Review Q#96612, answer score: 5

Revisions (0)

No revisions yet.