patternsqlMinor
Which columns should be indexed when all may be used in different search queries?
Viewed 0 times
maycolumnsallsearchusedindexeddifferentwhenwhichqueries
Problem
Background
I'm working on a website for a movie theather chain currently located in four different cities (might expand in the future). They use the same, single-database website for all cities, which means I have to have a column in certain tables which holds the ID of the city that each row belongs to.
Right now I have three different tables:
The structure of the
How this table will be used
A user of the website must be able to:
-
View all current/upcoming movies and showtimes (sorted by date) in the selected city.
Example query (backend):
-
Select a single movie and view all showtimes for that specific title only (in the selected city).
Example query:
-
Select a single day and view all movies and showtimes for that day only (in the selected city).
Example query:
So naturally I decided that I needed to create indexes for the columns.
Problem
What I'm having trouble with is deciding/determining how to index the columns properly. One index for each column seems
I'm working on a website for a movie theather chain currently located in four different cities (might expand in the future). They use the same, single-database website for all cities, which means I have to have a column in certain tables which holds the ID of the city that each row belongs to.
Right now I have three different tables:
Cinemas- Contains each city's cinema (ID and name).
Movies- Contains all movies that has been/will be shown at the cinema.
Showtimes- Contains all showtimes for all movies in all cities.
The structure of the
Showtimes table is the following:Column Name | Column Type | Description
--------------+--------------+---------------
ID | BIGINT | (Primary) Unique ID for each showtime (perhaps unnecessary?)
CinemaID | TINYINT | Foreign key bound to Cinemas.ID
MovieID | BIGINT | Foreign key bound to Movies.ID
Showtime | DATETIME | At what date and time the movie will show
(will contain multiple rows for each movie, i.e. one row for each showtime)
How this table will be used
A user of the website must be able to:
-
View all current/upcoming movies and showtimes (sorted by date) in the selected city.
Example query (backend):
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? ORDER BY Showtime-
Select a single movie and view all showtimes for that specific title only (in the selected city).
Example query:
SELECT Showtime FROM Showtimes WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime-
Select a single day and view all movies and showtimes for that day only (in the selected city).
Example query:
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM])So naturally I decided that I needed to create indexes for the columns.
Problem
What I'm having trouble with is deciding/determining how to index the columns properly. One index for each column seems
Solution
Composite Primary Key
I would define the primary key as a composite key of
These 3 columns uniquely identify each row, and so having a separate
Composite (Secondary) Index
With this PK, the only additional index you will need for your queries is
Why these indexes?
A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.
Imagine a spreadsheed with
All your queries have
As the 3rd column of
As for your other queries, they all start with
So, the
Even better, since your primary key includes
Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.
Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.
Afterthought
My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)
I would define the primary key as a composite key of
(CinemaID, MovieID, Showtime).These 3 columns uniquely identify each row, and so having a separate
ID column is not necessary.Composite (Secondary) Index
With this PK, the only additional index you will need for your queries is
(CinemaID, Showtime).Why these indexes?
A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.
Imagine a spreadsheed with
(CinemaID, MovieID, Showtime) that is sorted by each column successively.All your queries have
CinemaID present, which means you can quickly find the "section" of the spreadsheet for that CinemaID. Then, for your query that searches by MovieID, you can easily find the "subsection" in the 2nd column, where MovieID matches the searched for value.As the 3rd column of
Showtime is also sorted, you can imagine how quickly and easily it is to find all those show times for that movie, in that cinema. The DBMS does things in a similar way and can retrieve those results extremely quickly.As for your other queries, they all start with
CinemaID and then use Showtime in some manner. They also need the MovieID in their results.So, the
(CinemaID, Showtime) index has you covered there. Again, the CinemaID easily finds the "section" of the spreadsheet (in the analogy), and all possible showtimes (and there are going to be duplicates, assuming there is more than one screen) will be listed in order and are easily searched and/or sorted by those values.Even better, since your primary key includes
MovieID, that column is included for all secondary indexes after the defined columns (at least for MySQL InnoDB - others engines too, but not necessarily all.)Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.
Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.
Afterthought
My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)
Context
StackExchange Database Administrators Q#229064, answer score: 5
Revisions (0)
No revisions yet.