patternsqlCritical
Is it possible to mysqldump a subset of a database required to reproduce a query?
Viewed 0 times
querymysqldumpdatabasepossiblereproducesubsetrequired
Problem
Background
I would like to provide the subset of my database required to reproduce a
Question
Is there a way that I can incorporate this select statement into a script that dumps the queried data into a new database, such that the database could be installed on a new mysql server, and the statement would work with the new database. The new database should not contain records in addition to those that have been used in the query.
Update:
For clarification, I am not interested in a csv dump of query results. What I need to be able to do is to dump the database subset so that it can be installed on another machine, and then the query itself can be reproducible (and modifiable with respect to the same dataset).
Example
For example, my analysis might query a subset of data that requires records from multiple (in this example 3) tables:
I would like to provide the subset of my database required to reproduce a
select query. My goal is to make my computational workflow reproducible (as in reproducible research).Question
Is there a way that I can incorporate this select statement into a script that dumps the queried data into a new database, such that the database could be installed on a new mysql server, and the statement would work with the new database. The new database should not contain records in addition to those that have been used in the query.
Update:
For clarification, I am not interested in a csv dump of query results. What I need to be able to do is to dump the database subset so that it can be installed on another machine, and then the query itself can be reproducible (and modifiable with respect to the same dataset).
Example
For example, my analysis might query a subset of data that requires records from multiple (in this example 3) tables:
select table1.id, table1.level, table2.name, table2.level
from table1 join table2 on table1.id = table2.table1_id
join table3 on table3.id = table2.table3_id
where table3.name in ('fee', 'fi', 'fo', 'fum');Solution
mysqldump has the --where option to execute a WHERE clause for a given table.
Although it is not possible to mysqldump a join query, you can export specific rows from each table so that every row fetched from each table will be involved in the join later on.
For your given query, you would need to mysqldump three times:
First, mysqldump all table3 rows with name in ('fee','fi','fo','fum'):
Next, mysqldump all table2 rows that have matching table3_id values from the first mysqldump:
Then, mysqldump all table1 rows that have matching table1_id values from the second mysqldump:
Note: Since the second and third mysqldumps require using more than one table, --lock-all-tables must be used.
Create your new database:
Finally, load the three mysqldumps into another database and attempt the join there in the new database.
In mysql client, run your join query
Give it a Try !!!
WARNING : If not indexed correctly, the second and third mysqldumps may take forever !!!
Just in case, index the following columns:
I'll assume id is the primary key of table3.
Although it is not possible to mysqldump a join query, you can export specific rows from each table so that every row fetched from each table will be involved in the join later on.
For your given query, you would need to mysqldump three times:
First, mysqldump all table3 rows with name in ('fee','fi','fo','fum'):
mysqldump -u... -p... --where="name in ('fee','fi','fo','fum')" mydb table3 > table3.sqlNext, mysqldump all table2 rows that have matching table3_id values from the first mysqldump:
mysqldump -u... -p... --lock-all-tables --where="table3_id in (select id from table3 where name in ('fee','fi','fo','fum'))" mydb table2 > table2.sqlThen, mysqldump all table1 rows that have matching table1_id values from the second mysqldump:
mysqldump -u... -p... --lock-all-tables --where="id in (select table1_id from table2 where table3_id in (select id from table3 where name in ('fee','fi','fo','fum')))" mydb table1 > table1.sqlNote: Since the second and third mysqldumps require using more than one table, --lock-all-tables must be used.
Create your new database:
mysqladmin -u... -p... mysqladmin create newdbFinally, load the three mysqldumps into another database and attempt the join there in the new database.
mysql -u... -p... -D newdb < table1.sql
mysql -u... -p... -D newdb < table2.sql
mysql -u... -p... -D newdb < table3.sqlIn mysql client, run your join query
mysql> use newdb
mysql> select table1.id, table1.level, table2.name, table2.level
from table1 join table2 on table1.id = table2.table1_id
join table3 on table3.id = table2.table3_id
where table3.name in ('fee', 'fi', 'fo', 'fum');Give it a Try !!!
WARNING : If not indexed correctly, the second and third mysqldumps may take forever !!!
Just in case, index the following columns:
ALTER TABLE table2 ADD INDEX (table1_id);
ALTER TABLE table2 ADD INDEX (table3_id);
ALTER TABLE table3 ADD INDEX (name,id);I'll assume id is the primary key of table3.
Code Snippets
mysqldump -u... -p... --where="name in ('fee','fi','fo','fum')" mydb table3 > table3.sqlmysqldump -u... -p... --lock-all-tables --where="table3_id in (select id from table3 where name in ('fee','fi','fo','fum'))" mydb table2 > table2.sqlmysqldump -u... -p... --lock-all-tables --where="id in (select table1_id from table2 where table3_id in (select id from table3 where name in ('fee','fi','fo','fum')))" mydb table1 > table1.sqlmysqladmin -u... -p... mysqladmin create newdbmysql -u... -p... -D newdb < table1.sql
mysql -u... -p... -D newdb < table2.sql
mysql -u... -p... -D newdb < table3.sqlContext
StackExchange Database Administrators Q#4654, answer score: 67
Revisions (0)
No revisions yet.