Data Wrangling with MongoDB (Lesson 5 Analyzing data)

Yang Wang
Yang Wang
Jul 25, 2017 · 1 min read

Study notes and mind-map for the free course of Data Wrangling with MongoDB in Udacity

In this post, we are going to explore Twitter Snapshot Data (json, 17 MB .zip) using the MongoDB aggregation framework to do some initial analysis.

Get the twitter data into MongoDB

import json
import codecs

def insert_data(data, db):
for line in data:
db.twitter.insert(line)
def get_db(db_name):
from pymongo import MongoClient
client = MongoClient('localhost:27017')
db = client[db_name]
data = []
with codecs.open('twitter.json', 'rU', 'utf-8') as f:
insert_data(data, db)
return db
if __name__ == '__main__':
db = get_db('twitter')

Mind-map for aggregation notes

  • Aggregation pipeline
  • Using Group
  • Filter operation
  • Unwind
  • Group Accumulation
  • Index and Geospatial Indexes

So far, we have gone through several main topics of this course. In the next, we will gather all the pieces we have learned into the case study, OpenStreetMap data.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade