HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

inserting large quantity of data into mysql table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
intoquantitymysqllargeinsertingdatatable

Problem

I need to insert about 400 000 000 rows of test data into the InnoDB table.
Table is quite simple:

CREATE TABLE IF NOT EXISTS `tgt` (
  `id_1` int(10) unsigned NOT NULL,
  `id_2` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id_1`,`id_2`),
  KEY `id_1` (`id_1`),
  KEY `id_2` (`id_2`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


Data is coming from another table:

CREATE TABLE IF NOT EXISTS `src` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  # skip some columns
  PRIMARY KEY (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;


I need to insert into tgt all possible combinations of different ids from table src, so naive method is to execute:

INSERT INTO tgt (id_1, id_2) (SELECT  a.id,  b.id FROM `src` a INNER JOIN src b  WHERE a.id <> b.id)


And, of course it take hours or even days... So I am searching for a faster method of doing that.

Solution

-
Dump the source data into a file:

SELECT  a.id,  b.id INTO OUTFILE '/tmp/result.txt' FROM `src` a INNER JOIN src b  WHERE a.id <> b.id


-
Tune MySQL for large writes:

innodb_flush_log_at_trx_commit=2
innodb_log_file_size=2G


See how to change the log size

-
Start a pipe to split results.txt in chunks

pt-fifo-split --lines 1000000 /tmp/result.txt


(pt-fifo-split is a part of Percona Toolkit)

-
Load the dump

while [ -e /tmp/pt-fifo-split ]; do mysql -u  -p -e "LOAD DATA INFILE '/tmp/pt-fifo-split' INTO TABLE tgt" ; done


(if you have problem with "File '/tmp/pt-fifo-split' not found", try this solution )

Code Snippets

SELECT  a.id,  b.id INTO OUTFILE '/tmp/result.txt' FROM `src` a INNER JOIN src b  WHERE a.id <> b.id
innodb_flush_log_at_trx_commit=2
innodb_log_file_size=2G
pt-fifo-split --lines 1000000 /tmp/result.txt
while [ -e /tmp/pt-fifo-split ]; do mysql -u <user> -p<pass> -e "LOAD DATA INFILE '/tmp/pt-fifo-split' INTO TABLE tgt" <tgt db>; done

Context

StackExchange Database Administrators Q#65932, answer score: 4

Revisions (0)

No revisions yet.