Seeding 10 million data points into MongoDB

Joe Tam
Glitter Guys
Published in
2 min readMar 20, 2018

Yesterday, I began the first step of my SDC project (server design capstone). My goal for the day was to insert 10 million data points into my database in under an hour. I started with MongoDB because the original author of this repo chose that database. Eventually, I will be comparing benchmarks with a NoSQL database and choosing a primary database based off the results.

Using a simple schema of 13 lines filled with faker.js data, I started my seeding with the Mongoose ORM. I played around with different amounts of batch data to insert at one time and made a table on some results.

Looking at this table, you can see the roughly 30% performance boost switching from Mongoose to MongoDB Client. The batch amount when using Mongoose affected the speed as well. I saw that inserting 100 at a time gave the fastest time and using 10, the smallest increment I tested, gave the slowest time.

As I’m content with my seed data, there is room for improvement. I learned about the npm package “cluster”, which is extensible multi-core server manager that utilizes all your computer cores to parallel process scripts. That method can give another big performance boost, but for now I want to focus on to seeding a NoSQL database.

--

--