HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Implications of a second index on a timestamp field in a sequential table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fieldimplicationssequentialsecondtimestampindextable

Problem

I have a MySQL table of a few gigabytes with roughly ~100M rows. I store data sequentially, such that the timestamps increase as the ID also increases. Because ordering and filtering by the timestamp is so tremendously slow, I often simply use the ID to filter date ranges. If I know that Wednesday's data started at ID=87000000 and that Thursday's data started at ID=90000000, I can find all of Wednesday's data by filtering between those two IDs.

Everything works fast enough, except for the part where I need to find the ID range. This part requires filtering by the timestamp, and that is the slow part.

My question is this: If I add a second index on the timestamp field, what would be the implications? Would my insert rate be severely impacted? Would the size of my table increase dramatically? Would this improve performance enough to justify the action? If I were to add this index, would I even need to use the ID range trick anymore?

I've also been toying around with the idea of creating a function that performs a binary search on the table to find an ID near a target time. My assumption is that MySQL scans and compares the most of the table contents with the target time. This means, on average, 50000000 rows scanned. A binary search would take, at worst, Log2(100000000)=~27 queries for a single ID.

Any comments would be appreciated. I'll get to work on my binary search algorithm and post the results.

EDIT: My schema is:

``
CREATE TABLE
tracker_snapshot (
id int(11) NOT NULL AUTO_INCREMENT,
tracker_id int(11) NOT NULL,
time datetime NOT NULL,
value1 decimal(13,4) NOT NULL,
value2 decimal(13,4) NOT NULL,
value3 decimal(13,4) NOT NULL,
value4 decimal(13,4) NOT NULL,
value5 decimal(13,4) NOT NULL,
value6 decimal(13,4) NOT NULL,
PRIMARY KEY (
id),
KEY
tracker_snapshot_c9837659 (tracker_id),
CONSTRAINT
tracker_id_refs_id_68c52750 FOREIGN KEY (tracker_id) REFERENCES tracker_tracker (id`)
) ENGINE=InnoDB AUTO_INCR

Solution

Implication is that you need ~1144Mb extra storage to index column time for ~100M rows...

Because the InnoDB engine wiil store the data off an PRIMARY or UNIQUE index within an non PRIMARY or UNIQUE index, as result your secondary index will be become larger

How much larger?? you can calculate it with this formula

int = 4 bytes
datetime = 8 bytes

100000000 records * (4 + 8 bytes) = 
100000000 * 12 bytes ~ 1200000000 bytes ( 1144.40918 Mb ) extra storage (note index records/page overhead are not in the calculation)


An larger index size will slow down inserts, delete and only updates when you update an value whats indexed..
An larger index size in thoery can slow down selects because off the InnoDB index page off 16K (read http://www.ovaistariq.net/733/)
But still it depends on innodb configuration and cached data within the innodb buffer pool..

Or maybe you can use your approach by using an lookup table

CREATE TABLE tracker_snapshot_lookup (
    tracker_date DATE NOT NULL
  , tracker_snapshot_start_id INT UNSIGNED NOT NULL 
  , tracker_snapshot_end_id INT UNSIGNED NOT NULL

  , PRIMARY KEY(tracker_date)
  --   Covering index below is overkill... 
  -- , PRIMARY KEY(tracker_date, tracker_snapshot_start_id, tracker_snapshot_end_id)
) ENGINE = InnoDB;

insert into tracker_snapshot_lookup values('2013-11-13', 1, 10000);
insert into tracker_snapshot_lookup values('2013-11-14', 10001, 20000);


If you use an JOIN or deliverd table the MySQL optimzer can use in worse case

1 index key (Random disk I/O) lookup on tracker_snapshot_lookup.date (assuming with WHERE tracker_date = '2013-11-13' )
1 table (Random disk I/O) record key for tracker_snapshot_start_id and tracker_snapshot_end_id (not necessary when you make it an covering index)

Based on tracker_snapshot_start_id and tracker_snapshot_end_id MySQL will most likly choose an range scan (sequential disk I/O what is low costing with I/O waittime) on the tracker_snapshot table.


Your savings

DATE            3 bytes
INT NOT NULL    4 bytes

So in one year you lose on storage...

Table data

356 days * (3 + 4 bytes) 
356 * 12 = 4272 bytes ( 0.004 Mb )

Index data 

356 days * (3 bytes) = 1068 bytes ( 0.001 Mb )


It's magic because you use that ~1143Mb storage space for more important data

Code Snippets

int = 4 bytes
datetime = 8 bytes

100000000 records * (4 + 8 bytes) = 
100000000 * 12 bytes ~ 1200000000 bytes ( 1144.40918 Mb ) extra storage (note index records/page overhead are not in the calculation)
CREATE TABLE tracker_snapshot_lookup (
    tracker_date DATE NOT NULL
  , tracker_snapshot_start_id INT UNSIGNED NOT NULL 
  , tracker_snapshot_end_id INT UNSIGNED NOT NULL

  , PRIMARY KEY(tracker_date)
  --   Covering index below is overkill... 
  -- , PRIMARY KEY(tracker_date, tracker_snapshot_start_id, tracker_snapshot_end_id)
) ENGINE = InnoDB;


insert into tracker_snapshot_lookup values('2013-11-13', 1, 10000);
insert into tracker_snapshot_lookup values('2013-11-14', 10001, 20000);
1 index key (Random disk I/O) lookup on tracker_snapshot_lookup.date (assuming with WHERE tracker_date = '2013-11-13' )
1 table (Random disk I/O) record key for tracker_snapshot_start_id and tracker_snapshot_end_id (not necessary when you make it an covering index)

Based on tracker_snapshot_start_id and tracker_snapshot_end_id MySQL will most likly choose an range scan (sequential disk I/O what is low costing with I/O waittime) on the tracker_snapshot table.
DATE            3 bytes
INT NOT NULL    4 bytes

So in one year you lose on storage...

Table data

356 days * (3 + 4 bytes) 
356 * 12 = 4272 bytes ( 0.004 Mb )

Index data 

356 days * (3 bytes) = 1068 bytes ( 0.001 Mb )

Context

StackExchange Database Administrators Q#53277, answer score: 2

Revisions (0)

No revisions yet.