HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Optimizing a simple query on a large table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
simplequerylargeoptimizingtable

Problem

DB: MySQL 5.5.20 (WampServer, default configuration)

OS: Win 7

HDD: Western Digital 3TB Caviar Green, 3.5", IntelliPower, 64MB, Sata3 (WD30EZRX)

Memory: 8GB

MySQL my.ini: http://pastie.org/private/go9kaxlmlvirati2txbaa

Query in question:


SELECT name.id AS name_id, name.name, cast_info.id,

cast_info.role_id, cast_info.movie_id

FROM cast_info

LEFT JOIN name ON name.id = cast_info.person_id

WHERE cast_info.movie_id = 1000000

ORDER BY cast_info.movie_id ASC

It fetches all people who worked on a given movie. Problem is it can take anywhere from 0.1s to nearly 2.0s. That's too long. When the user needs to run it 10k times they might as well uninstall the application. Even I haven't had the patience to wait for it to finish.

edit: The time it takes to run the query is determined by the number of people who worked on it. Roughly 0.1s for each 10 people.

EXPLAIN:


1. row

id: 1

select_type: SIMPLE

table: cast_info

type: ref

possible_keys: idx_mid,mpi

key: idx_mid

key_len: 4

ref: const

rows: 15

Extra:

2. row

id: 1

select_type: SIMPLE

table: name

type: eq_ref

possible_keys: PRIMARY,id_name_idx

key: PRIMARY

key_len: 4

ref: imdb.cast_info.person_id

rows: 1

Extra:

Tables:


CREATE TABLE cast_info (

id int(11) NOT NULL AUTO_INCREMENT,

person_id int(11) NOT NULL,

movie_id int(11) NOT NULL,

person_role_id int(11) DEFAULT NULL,

note text,

nr_order int(11) DEFAULT NULL,

role_id int(11) NOT NULL,

PRIMARY KEY (id),

KEY idx_pid (person_id),

KEY idx_mid (movie_id),

KEY idx_cid (person_role_id),

KEY cast_info_role_id_exists (role_id),

KEY mpi (movie_id,

Solution

It would run faster with InnoDB,

-
The PRIMARY KEY is 'clustered'; this makes the lookup into name faster.

-
A suitable setting of innodb_buffer_pool_size of, say, 5G (for your 8GB machine) would cache a lot of stuff, thereby minimizing the I/O for 10K operations.

You could also shrink the data (and decrease the I/O) by using MEDIUMINT UNSIGNED or SMALLINT UNSIGNED where appropriate.

There are redundant indexes that could (should) be removed; this would speed up the load. Also name(6) is probably useless.

Context

StackExchange Database Administrators Q#124178, answer score: 5

Revisions (0)

No revisions yet.