snippetsqlMinor
How to optimize multiple self-JOINs?
Viewed 0 times
optimizemultiplehowjoinsself
Problem
I'm looking for advice on either optimizing multiple self-joins, or a better table/DB design.
One of the tables looks as follows (relevant cols only):
The
With all values in a single column, self-joins seemed like the way to go. So I have experimented with various suggestions to speed those up, including indexing and creating new (temp) tables. At 9 self-joins, the query takes a little under 1 min. Beyond that, it spins forever.
The new table from where the self-joins take place has only about 1,000 rows, indexed on what seem to be the relevant variables - creation takes about 0.5 sec:
The
```
SELECT
Countries.countryName AS Country,
em01.em,
em02.em,
em03.em
...
emX.em
FROM
(SELECT
em1.data AS em,
em1.countryID
FROM Growth AS em1
WHERE
em1.dataID = 523) as em01
JOIN
(SELECT
em2.data AS em,
em2.countryID
FROM Growth AS em2
WHERE
em2.dataID = 524) as em02
USING (countryID)
JOIN
(SELECT
em3.data AS em,
em3.countryID
FROM Growth AS em3
WHERE
em3.dataID = 525) as em03
USING (count
One of the tables looks as follows (relevant cols only):
CREATE TABLE IF NOT EXISTS CountryData (
countryDataID INT PRIMARY KEY AUTO_INCREMENT,
dataID INT NOT NULL REFERENCES DataSources (dataID),
dataCode VARCHAR(30) NULL,
countryID INT NOT NULL REFERENCES Countries (countryID),
year INT NOT NULL ,
data DEC(20,4) NULL,
INDEX countryDataYear (dataID, countryID, year));The
data column has values for a few hundred indicators, 90 countries, and 30 years for ~1mn rows total. A standard query requires selecting N indicators for a particular year and C countries, yielding a CxN table for 90 rows max.With all values in a single column, self-joins seemed like the way to go. So I have experimented with various suggestions to speed those up, including indexing and creating new (temp) tables. At 9 self-joins, the query takes a little under 1 min. Beyond that, it spins forever.
The new table from where the self-joins take place has only about 1,000 rows, indexed on what seem to be the relevant variables - creation takes about 0.5 sec:
CREATE TABLE Growth
SELECT dataID, countryID, year, data
FROM CountryData
WHERE dataID > 522 AND year = 2017;
CREATE INDEX growth_ix
ON Growth (dataID, countryID);The
SELECT query then arranges up to XX indicators in the results table, with XX unfortunately <10: ```
SELECT
Countries.countryName AS Country,
em01.em,
em02.em,
em03.em
...
emX.em
FROM
(SELECT
em1.data AS em,
em1.countryID
FROM Growth AS em1
WHERE
em1.dataID = 523) as em01
JOIN
(SELECT
em2.data AS em,
em2.countryID
FROM Growth AS em2
WHERE
em2.dataID = 524) as em02
USING (countryID)
JOIN
(SELECT
em3.data AS em,
em3.countryID
FROM Growth AS em3
WHERE
em3.dataID = 525) as em03
USING (count
Solution
I don't think there is a need for denormalizing the table. Self-joining will work fine if you have indexes that will be used effectively. For the specific query, I'd add an index on
The only issue you'll have is that MySQL has a hard limit of maximum 61 joins in a query. So, you won't be able to have 90 columns with the above query.
Another observation is that you don't seem to need any aggregated data but just a tiny subset of the (small or big table, doesn't matter). With the suggested index, you could just write a query like this:
and have the pivot transformation done in your application.
(year, dataID, countryID, data) and not use any derived tables:SELECT
c.countryName AS Country,
em01.data AS data01,
em02.data AS data02,
...
emXX.data AS dataXX
FROM
Countries AS c
JOIN CountryData AS em01
ON em01.year = 2017
AND em01.dataID = 523
AND em01.countryID = c.countryID
JOIN CountryData AS em02
ON em02.year = 2017
AND em02.dataID = 524
AND em02.countryID = c.countryID
...
JOIN CountryData AS emXX
ON emXX.year = 2017
AND emXX.dataID = YYY
AND emXX.countryID = c.countryID
;The only issue you'll have is that MySQL has a hard limit of maximum 61 joins in a query. So, you won't be able to have 90 columns with the above query.
Another observation is that you don't seem to need any aggregated data but just a tiny subset of the (small or big table, doesn't matter). With the suggested index, you could just write a query like this:
SELECT
cd.countryID,
c.countryName AS Country,
cd.dataID,
cd.data
FROM
Countries AS c
JOIN CountryData AS cd
ON cd.countryID = c.countryID
WHERE
cd.year = 2017
AND cd.dataID IN (522, 523, ..., YYY)
ORDER BY
cd.countryID,
cd.dataID ;and have the pivot transformation done in your application.
Code Snippets
SELECT
c.countryName AS Country,
em01.data AS data01,
em02.data AS data02,
...
emXX.data AS dataXX
FROM
Countries AS c
JOIN CountryData AS em01
ON em01.year = 2017
AND em01.dataID = 523
AND em01.countryID = c.countryID
JOIN CountryData AS em02
ON em02.year = 2017
AND em02.dataID = 524
AND em02.countryID = c.countryID
...
JOIN CountryData AS emXX
ON emXX.year = 2017
AND emXX.dataID = YYY
AND emXX.countryID = c.countryID
;SELECT
cd.countryID,
c.countryName AS Country,
cd.dataID,
cd.data
FROM
Countries AS c
JOIN CountryData AS cd
ON cd.countryID = c.countryID
WHERE
cd.year = 2017
AND cd.dataID IN (522, 523, ..., YYY)
ORDER BY
cd.countryID,
cd.dataID ;Context
StackExchange Database Administrators Q#43651, answer score: 4
Revisions (0)
No revisions yet.