Upload a Pandas DataFrame to MongoDB

smaxwell
Analytics Vidhya
Published in
4 min readJan 17, 2020

A quick and simple guide

If you have already setup a MongoDB account and cluster, I recommend skipping to the “Uploading The DataFrame to MongoDB” section. You may also want to look at the section before if you aren’t sure how to connect to and setup the database.

Setting up your MongoDB & Cluster

The first thing you will need to do is create a MongoDB account. You can do so here: Register for MongoDB

After registering, you should be prompted to create a cluster. Assuming you don’t want to pay, you can just select create a cluster in the “Starter Clusters” option.

Next, you will need to select a provider and a region. In this case, it is easiest just to select AWS and select the nearest location to you.

On the same page, if you scroll down you will see you are able to upgrade storage and backup options as well as name your cluster. I will be naming it “ClusterTest.” After selecting from these options, go ahead and click the “Create Cluster” button on the right side of site.

Now, you should see your cluster being created. It may take up to 3 minutes, so go ahead and grab some coffee while you wait! After it does load, go ahead and click the “Connect” button.

Setting Up a Connection to Your Cluster & Creating a DataBase

Next, you will be prompted to setup the connection security. In my experience, it is easiest to create a Mongodb user rather than whitelisting your IP address. This way you will be able to access the database from any computer. To do this, create a username and password that you can remember. Then, click the “Create MongoDB User” and “Choose a connection method” buttons.

Now, you get to choose how you connect. Since we are using a pandas dataframe in python, we will want to select “Connect Your Application.” From the lists of drivers select “Python” & ‘3.6 or later” for the version.

Click “copy.” The string that you have just saved is what you will use to connect to the database. Next, you will want to click the “Collections” tab then the “Add my own data” button. Then, fill out your desired names of the database and collection name and click the “Create” button.

Uploading The Pandas DataFrame to MongoDB

I recommend using a python notebook, but you can just as easily use a normal .py file type. You will want to fill in all the areas that have <<CAPS>> with your relevant information. The function “df_to_mongo” formats your dataframe into a json-like object by row and returns a list of each row.

# Imports
import pandas as pd
from pymongo import MongoClient
# Load csv dataset
data = pd.read_csv('<<INSERT NAME OF DATASET>>.csv')
# Connect to MongoDB
client = MongoClient("mongodb+srv://<<YOUR USERNAME>>:<<PASSWORD>>@clustertest-icsum.mongodb.net/test?retryWrites=true&w=majority")
db = client['<<INSERT NAME OF DATABASE>>']
collection = db['<<INSERT NAME OF COLLECTION>>']
data.reset_index(inplace=True)
data_dict = data.to_dict("records")
# Insert collection
collection.insert_many(data_dict)

Congrats! If you replaced the <<>> with your information and ran this code, then you should see the data from your pandas dataframe appear in the collections tab within the database you setup on MongoDB.

From here you can even filter results in the database by the column name and the value within that column. One thing to note is that your dataframe uploaded to MongoDB is not a table anymore since MongoDB is a non-relational database.

This is one of the few articles I have written so forgive me if I have any mistakes. I hope you found this article helpful and would greatly appreciate any criticism or questions you have.

--

--