patternsqlMinor

MySQL Index Creation Internals

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

indexmysqlcreationinternals

Problem

Right now there are two ways to build an index of a table in MySQL:

Create the table structure first, then import the data and add indexes later.

Create the table structure with indexes and then import the data.

In the first process we will have contiguous data (all fields) pages and then we have index pages. So when we query using index, MySQL has to first load the index pages and find out the matching keys and have to look up those primary key on the data pages. For this it has to then again load the data pages to get the data. This is useful when we have larger index scans as we have all index loaded contiguously.

In the second way of index creation the filtered index page will most probably contain the data page near to it as they were created at the same time. So I guess look up will be faster for a small range scans.

Is my understanding correct?

UPDATE:

I should have mentioned that the "PRIMARY KEY is enabled" (an auto increment id column) in the first way of importing data. So internal rowid is not generated and lot of IO is saved as we are not going to add a PRIMARY KEY.

As you noted there is fragmentation when we import data using the Second method.

Considering my requirements is for larger range scans (scanning ~100M rows) I guess I will go with the first way of importing data.

update on Jun 8 11:30

``


CREATE TABLE

table_dummy

 bigint(20) NOT NULL AUTO_INCREMENT,

column1

 bigint(20) DEFAULT NULL,

column2

 bigint(20) DEFAULT NULL,

column3

 bigint(20) DEFAULT NULL,

created_at

 datetime DEFAULT NULL,

column4

 tinyint(1) DEFAULT NULL,

column5

 tinyint(4) DEFAULT NULL,

column6

 bigint(20) DEFAULT NULL,

column6_created_at

 datetime DEFAULT NULL,

column7

 int(11) DEFAULT NULL,

column8

 tinyint(1) DEFAULT NULL,
  PRIMARY KEY (

),
  UNIQUE KEY

twtaccount_id_2 (column1,column2

),
  KEY

twtaccount_id (column1,created_at,column5

),
  KEY

twt_user_id (column3,created_at,column5`),
K

Solution

There are a few things I can clarify for you here:

-
Yes, it is a good practice to delay secondary index creation until after you import the data (starting from MySQL 5.5 - not before). Mysqlpump does this by default.

-
When you delay secondary index creation, internally MySQL will read, sort and then create the index (reducing fragmentation). For MySQL 5.7 there are additional optimizations here.

-
When you trickle load indexes, they may have more page splits and a lower page fill efficiency (fragmentation). This depends a little on the data, and if it is in order. You are not giving me many clues with column1, column2, but say for example column1 was a timestamp it would probably be in order.

-
Pages in an index are logically in order, not necessarily physically. This distinction may matter if you can not fit your working set for ranges in memory and are using spinning disks for backing storage. It's also a little hard to answer because with spinning disks you are probably using RAID with some stripe size, and the blocks at the filesystem level may also be non-contiguous.

-
Be aware also that the optimizer may not consider the range scan for large ranges over a tablescan (since prior to 5.7 the cost model assumed pages were not in memory). If you can rely on this being the case, you may want to FORCE INDEX and compare actual run times. (More info on new cost model.)

Context

StackExchange Database Administrators Q#102413, answer score: 3

Revisions (0)

No revisions yet.