Data Wrangling with MongoDB (Lesson 5 Analyzing data)
Jul 25, 2017 · 1 min read
Study notes and mind-map for the free course of Data Wrangling with MongoDB in Udacity

In this post, we are going to explore Twitter Snapshot Data (json, 17 MB .zip) using the MongoDB aggregation framework to do some initial analysis.
Get the twitter data into MongoDB
import json
import codecs
def insert_data(data, db):
for line in data:
db.twitter.insert(line)def get_db(db_name):
from pymongo import MongoClient
client = MongoClient('localhost:27017')
db = client[db_name]
data = []
with codecs.open('twitter.json', 'rU', 'utf-8') as f:
insert_data(data, db)
return db
if __name__ == '__main__':
db = get_db('twitter')
Mind-map for aggregation notes
- Aggregation pipeline
- Using Group
- Filter operation
- Unwind
- Group Accumulation
- Index and Geospatial Indexes

So far, we have gone through several main topics of this course. In the next, we will gather all the pieces we have learned into the case study, OpenStreetMap data.
