snippetjavascriptTip
Exploring indexes and cardinality with JavaScript
Viewed 0 times
cardinalityjavascriptandwithexploringindexes
Problem
Recently, I had to work a little bit with MongoDB's databases, collections, and indexes. Part of the things I had to learn or, rather, relearn, was how indexes work and how cardinality makes a significant difference in performance. Naturally, as soon as I finished my work, I wanted to create a mental model to understand these concepts better. Hence, I hacked together a little learning aid that we're exploring today.
> [!IMPORTANT]
>
> As far as I know, MongoDB and SQL databases use B-Trees to store data. This implementation is a simplified approximation to help you understand how indexes work and where cardinality comes into play. It is by no means a representative implementation of how such complex systems work in real life.
Imagine you have a large collection of objects that you need to search through. You can think of this collection as a large array of objects, where each object has a unique identifier (
> [!IMPORTANT]
>
> As far as I know, MongoDB and SQL databases use B-Trees to store data. This implementation is a simplified approximation to help you understand how indexes work and where cardinality comes into play. It is by no means a representative implementation of how such complex systems work in real life.
Imagine you have a large collection of objects that you need to search through. You can think of this collection as a large array of objects, where each object has a unique identifier (
id) and some properties. Naturally, you want to be able to search through this collection quickly and efficiently. This is where indexes come into play.Solution
const collection = {
objects: [],
index: new Map(),
add(...objects) {
objects.forEach((obj) => {
if (obj.id === object.id)
throw new Error(`Object with ID ${object.id} already exists`);
this.objects.push(object);
this.index.set(object.id, object);
});
},
get(id) {
return this.index.get(id);
},
remove(id) {
const object = this.index.get(id);
if (object) {
this.objects.splice(this.objects.indexOf(object), 1);
this.index.delete(id);
}
},
};
collection.add(
{ id: 400, name: 'Apple' }, { id: 840, name: 'Banana' },
{ id: 243, name: 'Cherry' }, { id: 123, name: 'Apple' },
{ id: 456, name: 'Banana' }
);
// collection.index:
// 400 => { id: 400, name: 'Apple' }
// 840 => { id: 840, name: 'Banana' }
// 243 => { id: 243, name: 'Cherry' }
// 123 => { id: 123, name: 'Apple' }
// 456 => { id: 456, name: 'Banana' }
collection.get(243); // { id: 243, name: 'Cherry' }
collection.get(400); // { id: 400, name: 'Apple' }
collection.get(840); // { id: 840, name: 'Banana' }>
> As far as I know, MongoDB and SQL databases use B-Trees to store data. This implementation is a simplified approximation to help you understand how indexes work and where cardinality comes into play. It is by no means a representative implementation of how such complex systems work in real life.
Imagine you have a large collection of objects that you need to search through. You can think of this collection as a large array of objects, where each object has a unique identifier (
id) and some properties. Naturally, you want to be able to search through this collection quickly and efficiently. This is where indexes come into play.Instead of searching through the entire collection with methods like
Array.prototype.filter() or Array.prototype.find(), you can create indexes on properties of the collection. That way, you can quickly find the objects you're looking for without having to iterate through the entire collection. Let's look at a simple example, using a Map to store each object via its id.In this example, we have a collection of objects, each with a unique
id field. We use a Map to create an index on the id property. When we add an object to the collection, we also add it to the index. When we want to retrieve an object by id, we can look it up in the index, which is much faster than searching through the entire collection.Cardinality refers to the uniqueness of values in a field/column. In the context of databases, it is a measure of how many distinct values exist in a given field/column. High cardinality means that there are many unique values, while low cardinality means that there are few unique values.
Code Snippets
const collection = {
objects: [],
index: new Map(),
add(...objects) {
objects.forEach((obj) => {
if (obj.id === object.id)
throw new Error(`Object with ID ${object.id} already exists`);
this.objects.push(object);
this.index.set(object.id, object);
});
},
get(id) {
return this.index.get(id);
},
remove(id) {
const object = this.index.get(id);
if (object) {
this.objects.splice(this.objects.indexOf(object), 1);
this.index.delete(id);
}
},
};
collection.add(
{ id: 400, name: 'Apple' }, { id: 840, name: 'Banana' },
{ id: 243, name: 'Cherry' }, { id: 123, name: 'Apple' },
{ id: 456, name: 'Banana' }
);
// collection.index:
// 400 => { id: 400, name: 'Apple' }
// 840 => { id: 840, name: 'Banana' }
// 243 => { id: 243, name: 'Cherry' }
// 123 => { id: 123, name: 'Apple' }
// 456 => { id: 456, name: 'Banana' }
collection.get(243); // { id: 243, name: 'Cherry' }
collection.get(400); // { id: 400, name: 'Apple' }
collection.get(840); // { id: 840, name: 'Banana' }While this example isn't perfect, it goes to show you how cardinality works. When we create an index, we essentially **group objects by a property**. The higher the cardinality of the property, the more groups we have. More groups means that, when we search for a specific object, we have to look through fewer items. In the case of low cardinality, we have fewer groups, which means that we have to look through more items to find the one we're looking for.
> [!TIP]
>
> Both examples are pretty basic. The astute reader will notice that both of these indexes may have valid use cases, depending on the nature of the data and the queries you want to perform. For example, if you want to retrieve all objects with a specific `name`, the second index is more efficient. However, if you want to retrieve an object by its `id`, the first index is more efficient. The key takeaway here is that **indexes are not one-size-fits-all**; they should be tailored to the specific use cases of your application.
## Multi-key indexes
In many cases, indexes contain **two or more keys**, what we call **compound indexes**. In such scenarios, **the order of keys matters** a lot. Imagine we want to retrieve a specific object by its `name` and `id`. In this case, we can create a compound index on both properties. Let's extract the indexing creation logic into a function to make it reusable.In this example, we create a compound index on the `name` and `id` properties. When we add an object to the collection, we also add it to the compound index. When we want to retrieve an object by its `name` and `id`, we can simply look it up in the compound index. If we want to retrieve the object with `name` equal to `Apple` and `id` equal to `123`, we'll have to retrieve the `Apple` group and then look for the object with `id` equal to `123`. But, let's look at the reverse order of the keys.Context
From 30-seconds-of-code: indexed-object-collections
Revisions (0)
No revisions yet.