patternsqlModerate
Large INSERT INTO SELECT [..] FROM gradually gets slower
Viewed 0 times
insertintoslowerlargeselectfromgetsgradually
Problem
I've written a program that does a large
The source table contains 2.5GB of data:
This is the destination table:
This is the query being run in batches of 100k:
The query is getting slower and slower. The first 1 million records inserted in 8 minutes. After that, records #1.2m - #1.3m alone took 7 mins. Now #2.3m - #2.4m just finished in 15 mins.
Here's the log of how long each batch of 100k takes to import.
As you can see, the first 12 batches (1.2 million records) insert in < 1 minute each. After that, the performance drops, with each batch taking a bit longer than the last!
```
24/11/2013 19:18:40 Ready
24/11/2013 19:18:42 Dictionary import started from Wiktionary
24/11/2013 19:18:42 Records: 3,729,613
24/11/2013 19:18:42 Batches of: 100,000
24/11/2013 19:19:11 Batch 1 finished in 00:00:29.3146767
24/11/2013 19:19:33 Batch 2 finished in 00:00:22.2142706
24/11/2013 19:19:41 Batch 3 finished in 00:00:07.6104353
24/11/2013 19:19:53 Batch 4 finished in 00:00:12.7057267
24/11/2013 19:20:08 Batch 5 finished in 00:00:14.9248537
24/11/2013 19:20:25 Batch 6 finished in 00:00:16.9849715
24/11/2013 19:20:43 Batch 7 finished in 00:00:17.7930177
24/1
INSERT in batches of 100,000 and shows its progress.The source table contains 2.5GB of data:
CREATE TABLE wikt.text (
old_id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
old_text MEDIUMBLOB NOT NULL,
old_flags TINYBLOB NOT NULL,
PRIMARY KEY (old_id),
KEY old_id (old_id)
) ENGINE=INNODB
AUTO_INCREMENT=23565544
DEFAULT CHARSET=binary;
CREATE INDEX old_id ON text (old_id);This is the destination table:
CREATE TABLE domains.dictionary_language (
text_id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
english TINYINT(1) UNSIGNED NOT NULL,
PRIMARY KEY (text_id),
KEY english (english)
) ENGINE=INNODB
AUTO_INCREMENT=23565544;This is the query being run in batches of 100k:
INSERT INTO domains.dictionary_language
SELECT old_id,
IF(old_text LIKE '%==English==%', 1, 0)
FROM wikt.text
LIMIT {batch}, 100000;The query is getting slower and slower. The first 1 million records inserted in 8 minutes. After that, records #1.2m - #1.3m alone took 7 mins. Now #2.3m - #2.4m just finished in 15 mins.
Here's the log of how long each batch of 100k takes to import.
As you can see, the first 12 batches (1.2 million records) insert in < 1 minute each. After that, the performance drops, with each batch taking a bit longer than the last!
```
24/11/2013 19:18:40 Ready
24/11/2013 19:18:42 Dictionary import started from Wiktionary
24/11/2013 19:18:42 Records: 3,729,613
24/11/2013 19:18:42 Batches of: 100,000
24/11/2013 19:19:11 Batch 1 finished in 00:00:29.3146767
24/11/2013 19:19:33 Batch 2 finished in 00:00:22.2142706
24/11/2013 19:19:41 Batch 3 finished in 00:00:07.6104353
24/11/2013 19:19:53 Batch 4 finished in 00:00:12.7057267
24/11/2013 19:20:08 Batch 5 finished in 00:00:14.9248537
24/11/2013 19:20:25 Batch 6 finished in 00:00:16.9849715
24/11/2013 19:20:43 Batch 7 finished in 00:00:17.7930177
24/1
Solution
The large offsets can have this effect. I would try to remove the offset and use only
LIMIT 10000:INSERT INTO domains.dictionary_language
(text_id, english)
SELECT t.old_id,
IF(t.old_text LIKE '%==English==%', 1, 0)
FROM wikt.text AS t
JOIN ( SELECT COALESCE(MAX(text_id), 0) AS offset
FROM domains.dictionary_language
) AS m
ON t.old_id > m.offset
ORDER BY t.old_id
LIMIT 100000;Code Snippets
INSERT INTO domains.dictionary_language
(text_id, english)
SELECT t.old_id,
IF(t.old_text LIKE '%==English==%', 1, 0)
FROM wikt.text AS t
JOIN ( SELECT COALESCE(MAX(text_id), 0) AS offset
FROM domains.dictionary_language
) AS m
ON t.old_id > m.offset
ORDER BY t.old_id
LIMIT 100000;Context
StackExchange Database Administrators Q#53831, answer score: 12
Revisions (0)
No revisions yet.