MongoDB Indexes - Part 1

In this post, I will try to explain what indexes in MongoDB are and how they help you improve the performance of your application. We will be talking about some indexes in MongoDB (Single Field, Compound Index, Multikey Index and Text Index). This is part one of a two-part series. Before we get started, although I try to explain everything as simple as I can, I assume you have a basic knowledge of what indexes are. So, if you are not familiar with them, it is better to familiarize yourself with the term index in the world of databases.
In this post, I will try to explain what indexes in MongoDB are and how they help you improve the performance of your application. We will be talking about some indexes in MongoDB (Single Field, Compound Index, Multikey Index and Text Index). This is part one of a two-part series. Before we get started, although I try to explain everything as simple as I can, I assume you have a basic knowledge of what indexes are. So, if you are not familiar with them, it is better to familiarize yourself with the term index in the world of databases.
Imagine we have a 5362-page phone book and we want to find our friend’s phone number. To make things simpler, let's call our friend, Tom Jones. How can we find Tom’s phone number? One simple, yet not efficient way, is to go through the book’s pages one by one to find his phone number. If the book is not sorted in any way, then finding his number would be a nightmare for us. It can take hours or even days to find our result. So, instead we can use the table of contents (TOC), which is probably sorted by name, to find the exact page (or a range of pages) which can point us to Tom’s phone number. Efficient, isn’t it? As you can see, the second way is not only simple, but also can save us a huge amount of time.
So how is this related to the world of databases? Imagine the phone book is not an actual book. Instead, we have an application in which it has stored hundreds of thousands of phone numbers. If we search for someone’s name, it can take a considerable amount of time for our database engine to look through each record and try to match our records with the condition we specified. As you all know, time is very valuable. We MUST NOT keep our clients waiting just to find someone’s phone number. If you keep them waiting, they will most likely stop using your service. Pretty bad, isn’t it? So indexes are our Superman (or Batman, if you like)!
Let’s get technical. MongoDB has different types of indexes - Single Field, Compound Index and Multikey Index to name but a few. When defining an index, we tell MongoDB that it should sort it in an Ascending or a Descending order. So MongoDB puts the indexed value into RAM in the right order. You can think of it as a sorted table. So, instead of trying to go through every document in the collection, it uses the values in RAM and it can find the needed document(s) in a very short time.
How can we understand if a query is using an index or it is scanning the collection? We can use the built-in explain function in mongo shell. Explain function, gives us lots of good information about our query and using these insights we can decide which fields to index. Other than using these indexes in our criteria, we can use them when sorting our documents.
Single Field Index
In a single field index, we create an index on just one field in our document. Like other databases, MongoDB, by default, creates an index on _id (the primary key field). This is a default singe field index. The following code shows how we can create an index on “name” field in our “phoneBook” collection.
db.phoneBook.createIndex( { name: -1 } )
The above code tells MongoDB to create this index in a descending order. To have an ascending index, we can change -1 to 1. If a document doesn’t contain the indexed field, null will be considered as the value for that field in that document. We can also use embedded fields when creating indexes.
db.phoneBook.createIndex( { “info.cardNumber”: -1 } )
The above code, creates a single field index on cardNumber field which is a field inside an embedded document called info of our phoneBook collection.
We can also use embedded documents when creating indexes.
db.phoneBook.createIndex( { info: -1 } )
The above code, creates a single field index on an embedded document called info of our phoneBook collection.
Compound Index
In a compound index, we create an index on multiple fields. The following example shows how to create a compound index on “name” descending and “email” ascending.
db.phoneBook.createIndex( { name: -1, email: 1 } )
As you can see, we can have one field indexed ascending and the other(s) descending. So the index sorts first by name descending and then, withing each name value, sorts by email (ascending). So if we have a person called John whose email is [email protected] and another person called John whose email is [email protected], the second person will appear first since his email address starts with the letter a.
When sorting, if sort keys correspond to index keys, MongoDB can use that index when sorting. However, in our above code, a sort operation on name descending and email descending will not use our index since the sort direction must be exactly the same orders or the exact opposite. So name descending, email ascending or name ascending and email descending will use the created index while sorting (The following queries will use the created index)
db.phoneBook.find().sort( { name: -1, email: 1 } ) db.phoneBook.find().sort( { name: 1, email: -1 } )
All other combinations of name and email, will not use the index to sort the results.
MongoDB supports something called index prefix. When having an index on multiple fields (a compound index) we can use any prefixes of that index in our queries. So, a prefix of a compound index is any subset that consists of at least one key (in the correct order of appearance).
db.phoneBook.createIndex( { name:1, email: 1, phoneNumber: 1, address: 1 } )
In the above code, the following prefixes are available:
{ name: 1 } { name: 1, email: 1 } { name: 1, email: 1, address: 1 }
As you can see, the order of appearance is important. The following examples are some invalid prefixes (prefixes which will not exist in our example):
*{ name: 1, email: -1 } { name: 1, address: 1 }
* We cannot change the sort order of our defined index in the prefix
Like queries, sort operations can use the prefix as well. So the following queries and sort operations can use the defined index using prefixes:
db.phoneBook.find().sort( { name: 1 } ) db.phoneBook.find().sort( { name: -1 } ) db.phoneBook.find().sort( { name: 1, email: 1 } ) db.phoneBook.find().sort( { name: -1, email: -1 } ) db.phoneBook.find().sort( { name: 1, email: 1, phoneNumber: 1 } ) db.phoneBook.find().sort( { name: -1, email: -1, phoneNumber: -1 } ) **db.phoneBook.find({name: “John”}).sort( { name: 1, email: 1 } )
** The index prefix in the query predicate can be different from the prefix in sort.
We can also use a non-prefix subset of the index in our sort, if and only if the query includes EQUALITY conditions on all prefix keys that precede the sort keys.
db.phoneBook.find({name: “John”}).sort( { email: 1, phoneNumber: 1 } ) db.phoneBook.find({email: “[email protected]”, name: “John”}).sort( { phoneNumber: 1 } ) ***db.phoneBook.find({ name: “[email protected]”, email: “[email protected]”, phoneNumber: {$lt: 555555555} }).sort( { phoneNumber: 1 } )
*** As you can see, only the index fields PRECEDING the sort subset must have equality conditions in the query document and the other index fields may have other conditions. Here because name and email are the only fields preceding phoneNumber, only these two fields must have equality conditions.
I know this post turned to be a long one, but I want to talk about two things before ending the post. These two things are called Partial Index and Sparse Index. These two directly affect the performance costs for index creation and maintenance.
Partial Index
Since indexed documents are also stored in RAM, there might be conditions in which lots of documents are stored in RAM and this can take a huge amount of RAM. Sometimes most of these documents will not match our conditions. So, instead of indexing all documents, we can index those documents that most likely show up in our results. For example, in a movies database application, there are times when only those movies with a rate of at least 3.5 are desired. Instead of indexing all movie documents, we can create a partial index to only index documents with a rate of at least 3.5.
db.movies.createIndex( { name: 1, duration: -1 }, { partialFilterExpression: { rate: { $gte: 3.5 } } } )
The above code will create a compound index that indexes only documents with rate field of greater than or equal to 3.5.
Sparse Index
Sparse index, indexes only those documents that have the indexed field, even if the indexed field contains a null value. Sparse indexes, will not index documents missing the indexed field.
db.phoneBook.createIndex( { city: 1 }, { sparse: true } )
The above example creates a sparse index on city field of phoneBook collection and will not index documents missing city field.
We can think of Partial Index as a superset of Sparse Index.