principlesqlMinor
SELECT INTO OUTFILE vs INSERT INTO ... SELECT
Viewed 0 times
selectintoinsertoutfile
Problem
During my cursory research I have not been able to find a definitive answer on the benefits that
sets an exclusive index record without a gap lock on each row inserted into T. If the transaction isolation level is READ COMMITTED or innodb_locks_unsafe_for_binlog is enabled, and the transaction isolation level is not SERIALIZABLE, InnoDB does the search on S as a consistent read (no locks). Otherwise, InnoDB sets shared next-key locks on rows from S.
To avoid the lock with
However, I have not been able to find any authoritative answer regarding locks and using
My goal is to avoid locking the source table while the query runs to avoid connection stacking.
SELECT INTO OUTFILE provides over INSERT INTO ... SELECT. Upon reading the docs related to INSERT INTO ... SELECT concerning locks on InnoDB tables it states:sets an exclusive index record without a gap lock on each row inserted into T. If the transaction isolation level is READ COMMITTED or innodb_locks_unsafe_for_binlog is enabled, and the transaction isolation level is not SERIALIZABLE, InnoDB does the search on S as a consistent read (no locks). Otherwise, InnoDB sets shared next-key locks on rows from S.
To avoid the lock with
INSERT INTO ... SELECT it seems that I must ensure the isolation level is READ COMMITTED to avoid locks on the source table during the query.However, I have not been able to find any authoritative answer regarding locks and using
SELECT INTO OUTFILE, not even the MySQL docs reference lock info.My goal is to avoid locking the source table while the query runs to avoid connection stacking.
Solution
You should use SELECT ... LOCK IN SHARE MODE. Why ?
SELECT ... LOCK IN SHARE MODE sets a shared mode lock on any rows that are read. Other sessions can read the rows, but cannot modify them until your transaction commits. If any of these rows were changed by another transaction that has not yet committed, your query waits until that transaction ends and then uses the latest values.
In your case, you could attempt this
This would do two
Personally, I do not think you have to be this heavy-handed. Transaction isolation should be smart enough to pull off this atomic
Whether you do
GIVE IT A TRY !!!
UPDATE 2014-12-10 15:12 EST
Your comment
Thx for the answer, and it does help, but the main point of the OP was to determine if there is a benefit for using SELECT INTO OUTFILE over INSERT INTO ... SELECT?
They are operationally different
UPDATE 2014-12-11 12:21 EST
The only thing I can think of in this context is the point-in-time of the data and when you are using it. With both types of operations, there will some implicit shared locking.
With
With
Therefore, if I was looking for a performance bonus, I would have give the edge to
UPDATE 2014-12-17 00:00 EST
Your Comment
I wasn't notified that you updated your answer, so I did bounty assuming it wasn't. Your explanation does make sense, in theory, however I am looking for a more authoritative response, in hopes that the overhead of the separate file (as you correctly mentioned) may be worth the trade-off in complexity for improved performance.
The only authoritative response would to come from the MySQL Documentation.
First, what does the MySQL Documentation LOAD DATA INFILE say ?
The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. LOAD DATA INFILE is the complement of SELECT ... INTO OUTFILE. (See Section 13.2.9.1, “SELECT ... INTO Syntax”.) To write data from a table to a file, use SELECT ... INTO OUTFILE. To read the file back into a table, use LOAD DATA INFILE.
Two paragraphs later, it says
For more information about the efficiency of INSERT versus LOAD DATA INFILE and speeding up LOAD DATA INFILE, see Section 8.2.2.1, “Speed of INSERT Statements”.
When you look at Speed of INSERT Statements, it says this:
To optimize insert speed, combine many small operations into a single
large operation. Ideally, you make a single connection, send the data
for many new rows at once, and delay all index updates and consistency
checking until the very end.
The time required for inserting a row is determined by the following
factors, where the numbers indicate approximate proportions:
Connecting: (3)
Sending query to server: (2)
Parsing query: (2)
Inserting row: (1 × size of row)
Inserting indexes: (1 × number of indexes)
Closing: (1)
This does not take into consideration the initial overhead to open
tables, which is done once for each concurrently running query.
The size of the table slows down the insertion of indexes by log N,
assuming B-tree indexes.
You can use the following methods to speed up inserts:
If you are inserting many rows from the same client at the same time,
use INSERT statements with multiple VALUES lists to insert several
rows at a time. This is considerably faster (many times faster in some
cases) than using separate single-row INSERT statements. If you are
adding data to a nonempty table, you can tune the
bulk_insert_buffer_size varia
SELECT ... LOCK IN SHARE MODE sets a shared mode lock on any rows that are read. Other sessions can read the rows, but cannot modify them until your transaction commits. If any of these rows were changed by another transaction that has not yet committed, your query waits until that transaction ends and then uses the latest values.
In your case, you could attempt this
START TRANSACTION;
SELECT ... LOCK IN SHARE MODE;
SELECT ... INTO OUTFILE;
ROLLBACK;This would do two
SELECT queries- First
SELECTto lock the rows in the table you wish
- Second
SELECTto performSELECT ... INTO OUTFILE
Personally, I do not think you have to be this heavy-handed. Transaction isolation should be smart enough to pull off this atomic
SELECT and use the same rows for the INSERT. I know I said should be which is why your are asking you question in the first place.Whether you do
SELECT ... INTO OUTFILE as one command or in the heavy-handed manner I am proposing, the row data of the source table will be fully readable.GIVE IT A TRY !!!
UPDATE 2014-12-10 15:12 EST
Your comment
Thx for the answer, and it does help, but the main point of the OP was to determine if there is a benefit for using SELECT INTO OUTFILE over INSERT INTO ... SELECT?
They are operationally different
SELECT INTO OUTFILEcreates a text file
INSERT INTO SELECTloads one table from the results of theSELECT
UPDATE 2014-12-11 12:21 EST
The only thing I can think of in this context is the point-in-time of the data and when you are using it. With both types of operations, there will some implicit shared locking.
With
SELECT INTO OUTFILE, you are preparing a result and saving it externally. Loading that data into a table using LOAD DATA INFILE will not involve any shared locking during the load process. Keep in mind that SELECT INTO OUTFILE will incur disk I/O and still impose some caching along the way.With
INSERT INTO SELECT, the shared locks would probably have to live longer in InnoDB because you are locking rows and using those same rows to INSERT into another table.Therefore, if I was looking for a performance bonus, I would have give the edge to
INSERT INTO SELECT because you are doing the same amount of shared row locking, The disk I/O for the single operation would have to be less than the separate SELECT INTO OUTFILE and subsequent LOAD DATA INFILE. Of course, you would have to compare the two methods against your dataset. What could a performance bonus for one dataset might be a performance cost for another dataset.UPDATE 2014-12-17 00:00 EST
Your Comment
I wasn't notified that you updated your answer, so I did bounty assuming it wasn't. Your explanation does make sense, in theory, however I am looking for a more authoritative response, in hopes that the overhead of the separate file (as you correctly mentioned) may be worth the trade-off in complexity for improved performance.
The only authoritative response would to come from the MySQL Documentation.
First, what does the MySQL Documentation LOAD DATA INFILE say ?
The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. LOAD DATA INFILE is the complement of SELECT ... INTO OUTFILE. (See Section 13.2.9.1, “SELECT ... INTO Syntax”.) To write data from a table to a file, use SELECT ... INTO OUTFILE. To read the file back into a table, use LOAD DATA INFILE.
Two paragraphs later, it says
For more information about the efficiency of INSERT versus LOAD DATA INFILE and speeding up LOAD DATA INFILE, see Section 8.2.2.1, “Speed of INSERT Statements”.
When you look at Speed of INSERT Statements, it says this:
To optimize insert speed, combine many small operations into a single
large operation. Ideally, you make a single connection, send the data
for many new rows at once, and delay all index updates and consistency
checking until the very end.
The time required for inserting a row is determined by the following
factors, where the numbers indicate approximate proportions:
Connecting: (3)
Sending query to server: (2)
Parsing query: (2)
Inserting row: (1 × size of row)
Inserting indexes: (1 × number of indexes)
Closing: (1)
This does not take into consideration the initial overhead to open
tables, which is done once for each concurrently running query.
The size of the table slows down the insertion of indexes by log N,
assuming B-tree indexes.
You can use the following methods to speed up inserts:
If you are inserting many rows from the same client at the same time,
use INSERT statements with multiple VALUES lists to insert several
rows at a time. This is considerably faster (many times faster in some
cases) than using separate single-row INSERT statements. If you are
adding data to a nonempty table, you can tune the
bulk_insert_buffer_size varia
Code Snippets
START TRANSACTION;
SELECT ... LOCK IN SHARE MODE;
SELECT ... INTO OUTFILE;
ROLLBACK;Context
StackExchange Database Administrators Q#84825, answer score: 6
Revisions (0)
No revisions yet.