snippetsqlMinor
How to add index to a big table with 60M records without downtime?
Viewed 0 times
withoutdowntimewithrecords60mbighowindextableadd
Problem
we have been struggling with one issue in the past few days. We want to add an index to a huge table with 60M records. At first we tried adding it with basic mysql syntax. But it clogged our production DB. That table is used very frequently in production queries. So everything suffered.
Our DB is hosted on AWS RDS. Its Mysql 5.7. We are using Laravel as our PHP framework
Next thing we read about was, we can copy the current table into a new one. Then add index to the new table. Then shift the laravel model to use the new table. We thought it made sense and it would be easy enough
But copying the table data from one table to the new one, was taking quite a lot of time. Our calculations showed it would take days. We tried using Laravel as well as SQL commands. But it was too slow either way.
Then we tried exporting the data as CSV and importing it, but again, too slow. The first few million records would insert fast, but then the table would become extremely slow in inserting.
Finally we tried
After going through all these different ways, we have decided to put the app in downtime and add the index on the huge table
I wanted to know, do others face this issue as well? Is there a way to either add indexes on a huge table without causing downtime on production? Or is there a faster way to copy a big mysql table without loss of data?
Our DB is hosted on AWS RDS. Its Mysql 5.7. We are using Laravel as our PHP framework
Next thing we read about was, we can copy the current table into a new one. Then add index to the new table. Then shift the laravel model to use the new table. We thought it made sense and it would be easy enough
But copying the table data from one table to the new one, was taking quite a lot of time. Our calculations showed it would take days. We tried using Laravel as well as SQL commands. But it was too slow either way.
Then we tried exporting the data as CSV and importing it, but again, too slow. The first few million records would insert fast, but then the table would become extremely slow in inserting.
Finally we tried
mysqldump and we realised it also locks the new table while inserting, so maybe that's why its fast enough. It took around 6 hours to copy the table into new one. BUT we were missing 2M records in this method. We also checked how many records came into the existing table while exporting/importing, it was only around a 100K. So the exporting/importing was missing 1.9M records, and we couldn't figure out why.After going through all these different ways, we have decided to put the app in downtime and add the index on the huge table
I wanted to know, do others face this issue as well? Is there a way to either add indexes on a huge table without causing downtime on production? Or is there a faster way to copy a big mysql table without loss of data?
Solution
For this example, let say you have the following in the database mydb
and you want to create two indexes as follows:
There are two methods you really need to look into.
METHOD #1 : Use Online DDL
You can launch your change with ALTER TABLE using the following syntax
Since you are adding an index and not changing any datatypes to any columns, row data will be scanned to generate index pages.
METHOD #2 : Use pt-online-schema-change
This tool hasd been a lifesaver for many over the years
You need to execute this in two phases
PHASE #1 : Dry Run
This phase, which is basically a syntax check, takes less than 5 seconds to run
PHASE #2 : Live Run
Here is what pt-online-schema-change will do
pt-online-schema-change Creates Three Triggers
Then, pt-online-schema-change will copy rows from mytable to _mytable_new. If any INSERTs, UPDATEs, or DELETEs during the run, the triggers will backfill those changes into _mytable_new
After every row is copied into _mytable_new, then pt-online-schema-change will do this
The three triggers are dropped towards the end.
You can see these steps in the dryrun.log file
You can see the progress in liverun.log
NOTE : Please run the liverun in a screen session or a background process
CREATE TABLE mytable (
id INT NOT NULL AUTO_INCREMENT,
num INT DEFAULT 0,
dat VARCHAR(32),
PRIMARY KEY (id)
);and you want to create two indexes as follows:
ALTER TABLE mytable
ADD INDEX num_ndx (ndx)
,ADD INDEX dat_ndx (dat)
;There are two methods you really need to look into.
METHOD #1 : Use Online DDL
You can launch your change with ALTER TABLE using the following syntax
ALTER TABLE mytable
ADD INDEX num_ndx (ndx)
,ADD INDEX dat_ndx (dat)
,ALGORITHM=INPLACE,LOCK=NONE
;Since you are adding an index and not changing any datatypes to any columns, row data will be scanned to generate index pages.
METHOD #2 : Use pt-online-schema-change
This tool hasd been a lifesaver for many over the years
You need to execute this in two phases
PHASE #1 : Dry Run
This phase, which is basically a syntax check, takes less than 5 seconds to run
MYSQL_HOST=...
MYSQL_USER=...
MYSQL_PASS=...
ALTER_TABLE_CLAUSE="ADD INDEX num_ndx (ndx),ADD INDEX dat_ndx (dat)"
RUNMODE"--dry-run"
PTOSC_OPTIONS="--alter-foreign-keys-method=auto"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --print"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --check-interval 10"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --max-lag 300"
PTOSC_OPTIONS="${PTOSC_OPTIONS} ${RUNMODE}"
DB-mydb
TB=mytable
pt-online-schema-change --alter "${ALTER_TABLE_CLAUSE}" \
h=${MYSQL_HOST},u=${MYSQL_USER},p=${MYSQL_PASS},D=${DB},t=${TB} \
${PTOSC_OPTIONS} >dryrun.log 2>&1PHASE #2 : Live Run
MYSQL_HOST=...
MYSQL_USER=...
MYSQL_PASS=...
ALTER_TABLE_CLAUSE="ADD INDEX num_ndx (ndx),ADD INDEX dat_ndx (dat)"
RUNMODE"--execute"
PTOSC_OPTIONS="--alter-foreign-keys-method=auto"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --print"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --check-interval 10"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --max-lag 300"
PTOSC_OPTIONS="${PTOSC_OPTIONS} ${RUNMODE}"
DB-mydb
TB=mytable
pt-online-schema-change --alter "${ALTER_TABLE_CLAUSE}" \
h=${MYSQL_HOST},u=${MYSQL_USER},p=${MYSQL_PASS},D=${DB},t=${TB} \
${PTOSC_OPTIONS} >liverun.log 2>&1Here is what pt-online-schema-change will do
CREATE TABLE _mytable_new LIKE mytable;
ALTER TABLE _mytable_new ADD INDEX num_ndx (ndx),ADD INDEX dat_ndx (dat);pt-online-schema-change Creates Three Triggers
- INSERT AFTER Trigger from mytable to _mytable_new
- UPDATE AFTER Trigger from mytable to _mytable_new
- DELETE AFTER Trigger from mytable to _mytable_new
Then, pt-online-schema-change will copy rows from mytable to _mytable_new. If any INSERTs, UPDATEs, or DELETEs during the run, the triggers will backfill those changes into _mytable_new
After every row is copied into _mytable_new, then pt-online-schema-change will do this
ANALYZE TABLE _mytable_new;
RENAME TABLE mytable TO _mytable_old,_mytable_new TO mytable;
DROP TABLE _mytable_old;The three triggers are dropped towards the end.
You can see these steps in the dryrun.log file
You can see the progress in liverun.log
NOTE : Please run the liverun in a screen session or a background process
Code Snippets
CREATE TABLE mytable (
id INT NOT NULL AUTO_INCREMENT,
num INT DEFAULT 0,
dat VARCHAR(32),
PRIMARY KEY (id)
);ALTER TABLE mytable
ADD INDEX num_ndx (ndx)
,ADD INDEX dat_ndx (dat)
;ALTER TABLE mytable
ADD INDEX num_ndx (ndx)
,ADD INDEX dat_ndx (dat)
,ALGORITHM=INPLACE,LOCK=NONE
;MYSQL_HOST=...
MYSQL_USER=...
MYSQL_PASS=...
ALTER_TABLE_CLAUSE="ADD INDEX num_ndx (ndx),ADD INDEX dat_ndx (dat)"
RUNMODE"--dry-run"
PTOSC_OPTIONS="--alter-foreign-keys-method=auto"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --print"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --check-interval 10"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --max-lag 300"
PTOSC_OPTIONS="${PTOSC_OPTIONS} ${RUNMODE}"
DB-mydb
TB=mytable
pt-online-schema-change --alter "${ALTER_TABLE_CLAUSE}" \
h=${MYSQL_HOST},u=${MYSQL_USER},p=${MYSQL_PASS},D=${DB},t=${TB} \
${PTOSC_OPTIONS} >dryrun.log 2>&1MYSQL_HOST=...
MYSQL_USER=...
MYSQL_PASS=...
ALTER_TABLE_CLAUSE="ADD INDEX num_ndx (ndx),ADD INDEX dat_ndx (dat)"
RUNMODE"--execute"
PTOSC_OPTIONS="--alter-foreign-keys-method=auto"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --print"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --check-interval 10"
PTOSC_OPTIONS="${PTOSC_OPTIONS} --max-lag 300"
PTOSC_OPTIONS="${PTOSC_OPTIONS} ${RUNMODE}"
DB-mydb
TB=mytable
pt-online-schema-change --alter "${ALTER_TABLE_CLAUSE}" \
h=${MYSQL_HOST},u=${MYSQL_USER},p=${MYSQL_PASS},D=${DB},t=${TB} \
${PTOSC_OPTIONS} >liverun.log 2>&1Context
StackExchange Database Administrators Q#334094, answer score: 8
Revisions (0)
No revisions yet.