MongoDB Aggregations — Part 2: Data Blends with Knowi

Exploring the new MongoDB Atlas aggregation pipeline and how to join Mongo data with other sources

Nate Hall
5 min readJul 22, 2019
@@BigDataBlender

MongoDB is an open-source, NoSQL database built to simplify storage of large, document-based, unstructured data. This article is the second of a 3-part series on MongoDB analytics, with the purpose of showing how to blend data stored in MongoDB with other databases for unified data exploration using Knowi.

MongoDB Aggregations — Part 1 explored how to perform aggregations inside MongoDB — including examples of a few important operations to prepare data and learn proper syntax.

In part 2, we’ll dive into the new Mongo Atlas aggregation pipeline builder and how to blend MongoDB data with other sources using Knowi.

MongoDB aggregation pipeline builder

The MongoDB Atlas aggregation pipeline builder update was released early in June, 2019. This allows MongoDB users a new way to test and run aggregations using MongoDB Atlas. Testing aggregations before deploying is key to maintain application stability and avoid “hours of trial and error”.

To start using the new Aggregation Pipeline Builder in the MongoDB Atlas cloud — click to the Collections view, and choose “Aggregation” next to the Find & Indexes tabs, as shown below:

From the drop-down menu, different aggregation “stages” can be tested, with auto-completion for operators to perform the assigned aggregation at each stage. This enables simplified testing & learning of 25+ different aggregation stages and the syntax behind them.

Aggregating data across sources

Once data inside MongoDB has been aggregated, the next step of “data engineering” usually requires joining data in MongoDB with other structured & unstructured databases — aggregating data across sources. This is done to contextualize information across the tech stack through a variety of methods, such as ETL, connecting via ODBC drivers, and data warehousing.

Depending on the complexity of the data stack, these methods are increasingly time-intensive — requiring teams of data engineers to select relevant data for downstream applications, make sure that the data is in relational format by flattening nested, unstructured data (eg. collections in MongoDB) and then load it into another data warehouse before analysis.

Knowi can be used to instantly explore data sets, cleanse messy data with SQL, blend multiple information stores using common join-keys, and build visualizations or downstream applications with Natural Language Intelligence; enabling shortened analytics product development cycles

Step 1 — Starting Knowi and connecting to MongoDB

The first step to joining data across databases with Knowi is to “sign-up” for an account at www.knowi.com

Once you’ve signed up you’ll be moved to the front-page of Knowi’s interface. Navigate to the “data sources” tab in Knowi and select “New Datasource” button and select the option for MongoDB or MongoDB Atlas depending on how you’re team deploys MongoDB.

To connect to a MongoDB instance, enter your host-id, port #, database name, log-in credentials. The other properties (database properties, agent, & SSH Tunnel) can be used to simplify integration alongside data security protocols.

For MongoDB Atlas, all that is needed to explore data in Knowi is the Atlas Connection String.

Step 2 — Instant Data Exploration

Exploring MongoDB Atlas collections with Knowi

Once the MongoDB instance has been connected, the contents of accessible collections can instantly be returned and explored using the data explorer UI on the left-hand side of the Knowi query screen. This enables drilling into the contents of individual documents inside collections, regardless of how nested the data is — as shown with the example of Visitor Team Statistics, which is nested in 5+ layers of data.

Data exploration is important because it enables users to evaluate whether data transformation is necessary to understand the contents of disparate databases. Inside Knowi, the Cloud9QL Query box can be used to complete necessary transformations and aggregations as introduced in part 1.

Step 3— Blend MongoDB with other sources

Once MongoDB collections have been connected, explored, and confirmed as usable inside Knowi — the join function can be used to blend MongoDB alongside any other NoSQL, SQL, or API-centric database to create a unified, virtualized dataset from multiple sources. To test out blending MongoDB data in Knowi yourself, check out this walk-through — which shows how to join MongoDB with a relational, MySQL database.

Knowi can connect and join any combination of 35+ structured and unstructured databases including leaders in the NoSQL space like CouchBase and Cassandra (DataStax). Once data-sources have been connected to Knowi — building a joined data set becomes intuitive.

For this example, we’ll blend data from MongoDB Atlas and MySQL:

Joining marketing data from MongoDB Atlas with customer location data from MySQL

By specifying “customer” as a common join-key between marketing data in MongoDB Atlas & customer-location data stored in MySQL data cross silos can be blended without prior reformatting or flattening. Joining these data-sets across Mongo and MySQL creates a unified view of data in minutes, without need for ETL workload to process different data structures.

Conclusion

When an organization’s data is running through NoSQL databases like MongoDB — it is no longer necessary to install ODBC drivers or ETL processes to join that data with other sources of information, enabling faster generation of insight across disparate data using natural language processing.

With Knowi, queries can be executed across data silos without extensive engineering resources. Combined with an end-end analytics product including visualizations, machine-learning based AI, and external data aggregation capabilities for MongoDB and other sources of mission-critical data, Knowi can help consolidate the aggregation process of MongoDB-based data with other components of the enterprise data portfolio.

More information about Knowi’s NLP-driven visualization on MongoDB can be found here, and will be the focus of MongoDB Aggregations - Part 3.

Test out a free, 3-week Knowi trial sandbox on our website.

--

--