Open data and Meetup.com dataset
Visiting websites, I often find myself amazed by the data they contains and would love to answer question like these:
- What’s the longest article on Medium ?
- What’s the most expensive and less expensive rental on Airbnb ?
- What’s the most commented laptop on Amazon ?
- What’s the songs that have been listened more than X times but less than Y times ? (small artists but with a strong community)
- What are the meetups around me with a group that’s very popular but do a meetup rarely ?
- What’s the kickstarter campaigns who missed their goals by less than 100$ while trying to raise more than 1000$ ?
I bet everybody ask himself these kind of questions too but dismiss it as “impossible”.
Why is it hard ? Most companies don’t expose a very advanced search tool for their data. What if they exposed their database ? What if you could download the database of every publicly accessible data of each website ?
Here is a first extraction of the meetup.com dataset: 100k groups and 200k events. I hope to make it easier to ask questions in the future. (via sql for example). I should be as efficient as googling “meetup dogs miami” and getting a list/map of events.
meetup.csv (500mb) (full credits to scrapinghub.com for running the spider for me)
Here is an example of what you can do with half the data:
- What’s the popular groups in Berlin ? : C++, OpenTechSchool, Expats, Bitcoin
- And in Tokyo ? International Singles, Kuchikomi Meetup (Word of Mouth Events), English Language , Pub Crawl & Activities
The code source: https://github.com/mdamien/scrapy-meetups