Working with MongoDB On-Premises and Cloud environments

How to work with MongoDB in hybrid environments using On-Premises and Azure Cloud.

José Pereira
medialesson

--

Introduction

When creating a system that involves accessing a database, it is very common to use a local database environment during development. In some cases, a local database engine might not have the same characteristics as the production one. One example is a On-Premises MongoDB versus an Azure Cosmos DB account.

One of the main differences we’ll find between these two environments (On-Premises and Cloud) is the fact that in Azure Cosmos DB we need to take the throughput into account when creating databases and collections. Throughput determines the amount of resources that an Azure Cosmos DB instance will have available to operate. To make things a bit more dramatic, managing the throughput correctly is imperative to avoid having to pay for unnecessary resources at the end of the month. In an On-Premises environment, this is usually not something we need to be concerned about.

In this story I’ll show you an approach to handle both environments (On-Premises and Cloud), taking into account that the throughput and configuration is needed for each one.

Getting ready

The complete source code of this solution (including a small usage sample), can be found here.
To run it you need:
- VS2017+
- MongoDB 3.2 (for On-Premises instalation click here)
- Azure Cosmos DB 3.2 (to create a new account click here)
Important nuget packages:
- Microsoft.Azure.DocumentDB.Core
- MongoDB.Driver

NOTE: If you’re creating a new Azure Cosmos DB account, be sure to select the correct API and Version

Azure Cosmos DB API and Version

Implementation

The full source code includes all the basic implementation expected in a database access service (create, delete, update), so feel free to explore it. In this section I’ll focus only on the main parts related to the handling of the different database environments.

Environments configuration
Since we are dealing with different environments, each with its own particularities, we´ll need to have a way to switch between them. For that we use a configuration class.

Database environment configuration

The configuration class holds several properties, some specific to a particular environment, but here are three properties that are common to all environments:
- DatabaseLocation: Defined by the DbLocation enum.
-
DatabaseName: Name of the database to create/connect.
-
Connection string: Connection string to the database instance.

Database environment configuration samples

NOTE: The connection string for the Azure Cosmos DB service can be obtained after the account is created by going to the menu option “Connection Strings”. There you´ll find the connection string that you need to include in the configuration for the Cloud environment.

How to get Azure Cosmos DB account ‘Connection String’

In the provided sample code we have a MongoDbFactory class that handles the database creation. This class receives the configuration as a constructor parameter and uses it to define which code to execute depending on the defined DatabaseLocation.

Continue to read to take a look at how we deal with the database creation in the different environments!

Database Creation
With the configuration in place, the second step is the database creation.
For DbLocation.Local environments, the creation is fairly straightforward. In fact, creating a database in an On-Premises pure MongoDB instance it’s as simple as calling the GetDatabase method found in MongoDB.Driver.MongoClient. Check out this post for a more detailed explanation on how to use MongoDB driver in On-Premises instances.

Create database in On-Premises MongoDb instance

Creating a database in Azure Cosmos DB involves also defining the throughput value. Although this is not a mandatory step, we wanted to be sure of the throughput that was assigned to a new database.

Create database in Azure Cosmos DB v3.2 and set the throughput

There are 2 major steps in the above code, to create a database on Azure Cosmos DB:

  • First step is to create a connection to the Azure Cosmos DB service, using the DocumentClient class provided in the Microsoft.Azure.DocumentDB.Core nuget package. This is done in the GetDocumentClient method.
  • Second step is to create the database, using the DocumentClient.CreateDatabaseIfNotExistAsync method. The special part in this call is the RequestOptions class that allows you to define the throughput to assign to the database when creating it. More info on how to manually provision throughput in Azure Cosmos DB, can be found here.

Creating Collections
So far so good! We have a database created and now we’ll add some collections. In our case, we used shared throughput which means that we had to create sharded/partitioned collections.

In layman´s terms, creating a sharded collection means that you must explicitly define the partition key of the collection. You can do that by sending a command to the database. MongoDB commands and queries are all JSON documents, so we can use the driver’s BsonDocument class to construct the command (click here to get more info on this topic). The code to create a sharded collection looks like this:

Create a sharded collection

The shardCollection is the fully qualified database and collection name followed by the hashed key.

The more generic way of using the MongoDb Driver to create a collection is by calling the CreateCollectionAsync method that takes the collection name. To be sure that we also support On-Premises pure MongoDB instances, we should ensure that it has a fallback method to the shard collection creation. The code would look something like this:

Ensuring the support of collection creation, for pure MongoDB instances

Throughput management
And finally we reach the throughput management. Our use case forced us to have a higher throughput at startup to be able to import large quantity of data. Once that data was imported, the throughput is scaled down to reduce unnecessary costs.

We calculate the final shared throughput based on the number of existing collections, so the first step is to query the database to get the count of existing collections. The code for that looks like this:

Get collection count from MongoDB database

To actually change the throughput, we use the method ReplaceOfferAsync. This method comes from the DocumentClient class provided in the Microsoft.Azure.DocumentDB.Core nuget package. The code to make the throughput change looks like this:

Data shared throughput change request

Note that there´s a minimium limit for the defined throughput (in our case 400), so we must ensure that the value of the offer is above that limit. To make that calculation we assign an arbitrary value of 100 for each existing collection and we make sure that the final value is not bellow 400:

  • Math.Max(400, (collectionCount * 100))

Summary

In the end we get an ‘abstraction’ layer that allows us to seamlessly change between different MongoDB database environments. The environment swap is achieved by using an environment specific configuration that will provide the necessary information to decide on how to interact with the database service.

This implementation targets MongoDB v3.2, but currently Azure Cosmos DB supports version v3.6, meaning that as a future implementation, one could expand this approach to also support v3.6 databases.

--

--