patternModerate
SQL INDEX - how it works?
Viewed 0 times
sqlindexworkshow
Problem
My knowledge of databases and SQL is based in most on university classes. Anyhow, I spent few monts (almost a year) in a company, where I was working with databases.
I have read few books and I have taken part in few trainings about databases such as
As well as I said, I am begginer, with a lot of lacks of knowledge but today, someone told something, what is totally against my begginer's knowledge.
Let me explain. Let's take SQL database and create simple table
Now, it's the part, I would like to focus on -
So far, I thought it works in this way: when a table is being created the
Grouping one by one:
so, for my example with
So, when I am using query
This approach works in time
I have read few books and I have taken part in few trainings about databases such as
MySQL, PostgreSQL, SQLite, Oracle and also few nonSQL dbs such us MongoDB, Redis, ElasticSearch etc.As well as I said, I am begginer, with a lot of lacks of knowledge but today, someone told something, what is totally against my begginer's knowledge.
Let me explain. Let's take SQL database and create simple table
Person with few records inside:id | name | age
-----------------
1 | Alex | 24
2 | Brad | 34
3 | Chris | 29
4 | David | 28
5 | Eric | 18
6 | Fred | 42
7 | Greg | 65
8 | Hubert | 53
9 | Irvin | 17
10 | John | 19
11 | Karl | 23Now, it's the part, I would like to focus on -
id is the INDEX.So far, I thought it works in this way: when a table is being created the
INDEX is empty. When I am adding new record to my table the INDEX is being recalculated based on some alghortims. For example:Grouping one by one:
1 ... N
N+1 ... 2N
...
XN+1 ... (X+1)Nso, for my example with
size = 11 elements and N = 3 it will be like this:id | name | age
-----------------
1 | Alex | 24 // group0
2 | Brad | 34 // group0
3 | Chris | 29 // group0
4 | David | 28 // group1
5 | Eric | 18 // group1
6 | Fred | 42 // group1
7 | Greg | 65 // group2
8 | Hubert | 53 // group2
9 | Irvin | 17 // group2
10 | John | 19 // group3
11 | Karl | 23 // group3So, when I am using query
SELECT * FROM Person WHERE id = 8 it will do some simple calculation 8 / 3 = 2, so we have to look for this object in group2 and then this row will be returned:8 | Hubert | 53This approach works in time
O(k) where k << size. Of course, an alghoritm to organise rows in groups Solution
You're basically describing a B-tree index and a hash index. They both have a place, but both are best suited for different jobs.
Advantages and disadvantages
B-tree (and B+-tree) indexes are usually balanced. This means that looking for a value will always take the same amount of time no matter where in the tree it falls (O(log n)). Generally, the number of levels in the tree is limited, so it tends to get "wider" not "deeper". For small data sets, the cost of maintaining and using the B-tree, however, can be more than just reading all the rows. B-tree indexes are good for large data sets, data sets with low selectivity, or data sets where you intend to select a range of objects not just one object.
Hash tables are great for small data sets. Hash indexes have a predefined number of hash buckets, depending on the hashing algorithm used. This is because a given hash algorithm can only produce so many unique hashes, so it only gets "deeper" not "wider". Once the database engine finds the right bucket, it then walks through all the objects in that bucket to find the one you want. With small, highly selective data sets each bucket contains a very small number of objects and is resolved pretty quickly. With larger data sets, the buckets get much more crowded. So, if the object you need is in a small bucket or is near the beginning of the bucket, it returns pretty quick. If it's at the end of a large bucket, it takes longer. The index is not balanced, so the performance is anywhere from O(1) to O(n).
Popularity
In general, I've run across B-trees the most. Bitmap indexes are also another option for values with a low cardinality (think booleans or maybe gender). This is going to vary depending on your database engine as to what index types are available.
NoSQL
NoSQL databases definitely support indexes. Most support B-tree or a variation on B-tree. Most seem to support hashed indexes as well.
Advantages and disadvantages
B-tree (and B+-tree) indexes are usually balanced. This means that looking for a value will always take the same amount of time no matter where in the tree it falls (O(log n)). Generally, the number of levels in the tree is limited, so it tends to get "wider" not "deeper". For small data sets, the cost of maintaining and using the B-tree, however, can be more than just reading all the rows. B-tree indexes are good for large data sets, data sets with low selectivity, or data sets where you intend to select a range of objects not just one object.
Hash tables are great for small data sets. Hash indexes have a predefined number of hash buckets, depending on the hashing algorithm used. This is because a given hash algorithm can only produce so many unique hashes, so it only gets "deeper" not "wider". Once the database engine finds the right bucket, it then walks through all the objects in that bucket to find the one you want. With small, highly selective data sets each bucket contains a very small number of objects and is resolved pretty quickly. With larger data sets, the buckets get much more crowded. So, if the object you need is in a small bucket or is near the beginning of the bucket, it returns pretty quick. If it's at the end of a large bucket, it takes longer. The index is not balanced, so the performance is anywhere from O(1) to O(n).
Popularity
In general, I've run across B-trees the most. Bitmap indexes are also another option for values with a low cardinality (think booleans or maybe gender). This is going to vary depending on your database engine as to what index types are available.
NoSQL
NoSQL databases definitely support indexes. Most support B-tree or a variation on B-tree. Most seem to support hashed indexes as well.
Context
StackExchange Database Administrators Q#62576, answer score: 12
Revisions (0)
No revisions yet.