patternsqlMinor
StackExchange clone: where should I add my indexes?
Viewed 0 times
whereindexesstackexchangeshouldcloneadd
Problem
I'm creating an open source stack exchange clone and the following is my schema. What should I add indexes on for it to be optimal?
Here is the schema in Rails format (SQL format below as well):
```
create_table "comments", force: true do |t|
t.integer "id"
t.integer "post_id", null: false
t.integer "user_id", null: false
t.text "body", null: false
t.integer "score", default: 0, null: false
t.datetime "created_at"
t.datetime "updated_at"
end
create_table "post_types", force: true do |t|
t.integer "id"
t.string "name", null: false
end
create_table "posts", force: true do |t|
t.integer "id"
t.integer "post_type_id", limit: 2, null: false
t.integer "accepted_answer_id"
t.integer "parent_id"
t.integer "user_id", null: false
t.text "title", limit: 255, null: false
t.text "body", null: false
t.integer "score", default: 0, null: false
t.integer "views", default: 1, null: false
t.datetime "created_at"
t.datetime "updated_at"
end
create_table "posts_tags", force: true do |t|
t.integer "id"
t.integer "post_id", null: false
t.integer "tag_id", null: false
end
create_table "tag_synonyms", force: true do |t|
t.integer "id"
t.string "source_tag", null: false
t.string "synonym", null: false
end
create_table "tags", force: true do |t|
t.integer "id"
t.string "name", null: false
end
create_table "users", force: true do |t|
t.integer "id"
t.string "first_name", limit: 50
t.string "last_name", limit: 50
t.string "display_name", limit: 100, null: false
t.string "email", limit: 100, null: false
t.string "password",
Here is the schema in Rails format (SQL format below as well):
```
create_table "comments", force: true do |t|
t.integer "id"
t.integer "post_id", null: false
t.integer "user_id", null: false
t.text "body", null: false
t.integer "score", default: 0, null: false
t.datetime "created_at"
t.datetime "updated_at"
end
create_table "post_types", force: true do |t|
t.integer "id"
t.string "name", null: false
end
create_table "posts", force: true do |t|
t.integer "id"
t.integer "post_type_id", limit: 2, null: false
t.integer "accepted_answer_id"
t.integer "parent_id"
t.integer "user_id", null: false
t.text "title", limit: 255, null: false
t.text "body", null: false
t.integer "score", default: 0, null: false
t.integer "views", default: 1, null: false
t.datetime "created_at"
t.datetime "updated_at"
end
create_table "posts_tags", force: true do |t|
t.integer "id"
t.integer "post_id", null: false
t.integer "tag_id", null: false
end
create_table "tag_synonyms", force: true do |t|
t.integer "id"
t.string "source_tag", null: false
t.string "synonym", null: false
end
create_table "tags", force: true do |t|
t.integer "id"
t.string "name", null: false
end
create_table "users", force: true do |t|
t.integer "id"
t.string "first_name", limit: 50
t.string "last_name", limit: 50
t.string "display_name", limit: 100, null: false
t.string "email", limit: 100, null: false
t.string "password",
Solution
Let's go though a few things here... (now that you actually show the database structure instead of only the Rails 'view', we can see what's happening).....
"Relational Databases" are about "Relationships". Relationships are expressed by having queries 'join' two or more tables. The Joins require matching columns on both tables. For example, the
If you have some comments and want to find the details of the posts they are on, then you will want to select from the
When you select on a column, you often (normally) want that column to be indexed.
So, for each of your 'primary key' columns you will automatically also have an index. You need to index the 'other' side of the relationship as well.
Comments table
If you do queries that select the comments on a particular post, then you need an index on the
I suspect you may also have occasional queries for all the posts for a given user, which means you will probably want another index on the
post_types
No problems here.
Posts
You will want indexes on the following:
Should
Post-Tags
you will want two indexes here, and for performance reasons, you will probably want them duplicated. Explaining why is beyond this answer, but look for 'index coverage':
Tag Synonyms
Tags
Fine
Users
Recommend an index on:
(Are you sure you don't mind the users having no name)
Should
Vote_Types
fine
Votes
Should
Conclusion
Now you have some suggestions on what indexes you should start with, the next step is monitoring where your actual performance is poor, and targeting those areas for additional optimization. To do that, you need to actually be running your application, and finding out what your actual queries look like, and running those queries to see what the actual execution plans are, and where those plans look like they need help by adding an index.
- you do not have any primary keys on your database. Primary keys are part of the database's referential integrity, and ensure that you and your programs do the 'right thing'. Additionally, primary keys are implemented as an index, so they will ensure that primary-key-related access to your table is fast.
- you do not have a
- similarly, you do not have a
So, you have no keys, and as a result, you are missing what are normally the most critical indices. Set up each table to have a key and you will be most of the way there.
Most databases now contain tools that will recommend indexes for you based on queries that you often run.
"Relational Databases" are about "Relationships". Relationships are expressed by having queries 'join' two or more tables. The Joins require matching columns on both tables. For example, the
post_id on the comment table matches the id on the posts table.If you have some comments and want to find the details of the posts they are on, then you will want to select from the
posts table where the comment_id is a certain (set of) values.When you select on a column, you often (normally) want that column to be indexed.
So, for each of your 'primary key' columns you will automatically also have an index. You need to index the 'other' side of the relationship as well.
Comments table
created_at should not be nullable. Nullable columns typically have a small impact on performance. All comments are created, and thus should all have a date, and there is no need for it to be null.If you do queries that select the comments on a particular post, then you need an index on the
post_id.I suspect you may also have occasional queries for all the posts for a given user, which means you will probably want another index on the
user_idpost_types
No problems here.
Posts
You will want indexes on the following:
- if you want to select the post for a given parent,
parent_id
- if you want to select posts for a given user, then
user_id
- if you want to select posts for a given type, then
post_type_id
- you will also want to index the title, since this may make searches easier.
- look in to full-text indexing for the body.
Should
created_date be nullable?Post-Tags
you will want two indexes here, and for performance reasons, you will probably want them duplicated. Explaining why is beyond this answer, but look for 'index coverage':
- index on both
tag_idandpost_id
- index on both
post_idandtag_id
Tag Synonyms
source_tag should be source_id and should be an integer. Also with an index.synonym should be synonym_id and should be an integer. It should also have an index.Tags
Fine
Users
Recommend an index on:
display_name- so people can find themselves easily (and hopefully you have enough users for it to be needed).
(Are you sure you don't mind the users having no name)
Should
created_date be nullable?Vote_Types
fine
Votes
vote_type_id, post_id and user_id should each have their own index.Should
created_date be nullable?Conclusion
Now you have some suggestions on what indexes you should start with, the next step is monitoring where your actual performance is poor, and targeting those areas for additional optimization. To do that, you need to actually be running your application, and finding out what your actual queries look like, and running those queries to see what the actual execution plans are, and where those plans look like they need help by adding an index.
- you do not have any primary keys on your database. Primary keys are part of the database's referential integrity, and ensure that you and your programs do the 'right thing'. Additionally, primary keys are implemented as an index, so they will ensure that primary-key-related access to your table is fast.
- you do not have a
post_id column on your post table????? Really? This makes no sense.... unless parent_id is supposed to be the unique identifier.....- similarly, you do not have a
user_id on the users table. What gives?So, you have no keys, and as a result, you are missing what are normally the most critical indices. Set up each table to have a key and you will be most of the way there.
Most databases now contain tools that will recommend indexes for you based on queries that you often run.
Context
StackExchange Database Administrators Q#55338, answer score: 6
Revisions (0)
No revisions yet.