HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Query SteamIDs for presence of an avatar

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
steamidsavatarqueryforpresence

Problem

I have a table vanrust_corev2.tblPlayerLog that has approx 6.7k records,
I also have a table in a different database vanrust_forum.tblPlayerAvatars which has approx 6.7k records.

The purpose of this script is to pull the SteamID of players who don't have an avatar stored for them yet. However the issue here is, the execution time of this script is often 29-31 seconds which seemed a little strange to me as I'm sure there are much more sophisticated and larger databases that would not have this issue.

SELECT
*
FROM
vanrust_corev2.tblPlayerLog log
WHERE
not exists(
SELECT 
a.SteamID
FROM vanrust_forum.tblPlayerAvatars a
WHERE a.Steamid = log.SteamID)


This is for the same user, just to a different database.

This script works perfectly the only issue here is execution time so I'm hoping for some pointers on where I could be going wrong.

Here is the execution plan:

Solution

I normally wouldn't recommend this, but...

According to your query execution plan, it is doing 22719171 table scans in order to fetch the next row. What I think the problem stems from is querying across databases, which may have to open a connection for each Handler_read_rnd_next, then close the connection each time. One way or the other, if your columns are properly indexed, something like this shouldn't take near that long. I think something about the setup is causing the optimizer to go into a loop across two databases, and SQL is not good at looping. To make this more set-based, I'd suggest pulling your data sets into one of the two databases with a construct like a temp table:

START TRANSACTION;
  DROP TEMPORARY TABLE IF EXISTS vanrust_corev2.users_with_avatars;
  CREATE TEMPORARY TABLE vanrust_corev2.users_with_avatars (SteamID INT NOT NULL);
  INSERT INTO vanrust_corev2.users_with_avatars (SteamId) VALUES
    SELECT 
      a.SteamID
    FROM vanrust_forum.tblPlayerAvatars a
COMMIT;


Then, you could simply join the record sets within the vanrust_corev2 database. This would increase I/O for a bit, but likely remove the need to connect between 2 databases millions of times while it does table scans:

SELECT
*
FROM
vanrust_corev2.tblPlayerLog log
WHERE
NOT EXISTS (
  SELECT 
  a.SteamID
  FROM vanrust_corev2.users_with_avatars a
  WHERE a.Steamid = log.SteamID
);


And of course it's a good idea to DROP TEMPORARY TABLE vanrust_corev2.users_with_avatars; when you are done, although it will otherwise automatically drop when the session is ended with the database.

On another note, I strongly recommend you start naming aliases in a way that describes what they represent. log is OK, but a doesn't seem to mean anything.

Code Snippets

START TRANSACTION;
  DROP TEMPORARY TABLE IF EXISTS vanrust_corev2.users_with_avatars;
  CREATE TEMPORARY TABLE vanrust_corev2.users_with_avatars (SteamID INT NOT NULL);
  INSERT INTO vanrust_corev2.users_with_avatars (SteamId) VALUES
    SELECT 
      a.SteamID
    FROM vanrust_forum.tblPlayerAvatars a
COMMIT;
SELECT
*
FROM
vanrust_corev2.tblPlayerLog log
WHERE
NOT EXISTS (
  SELECT 
  a.SteamID
  FROM vanrust_corev2.users_with_avatars a
  WHERE a.Steamid = log.SteamID
);

Context

StackExchange Code Review Q#93712, answer score: 6

Revisions (0)

No revisions yet.