Open data and dataset

Visiting websites, I often find myself amazed by the data they contains and would love to answer question like these:

  • What’s the longest article on Medium ?
  • What’s the most expensive and less expensive rental on Airbnb ?
  • What’s the most commented laptop on Amazon ?
  • What’s the songs that have been listened more than X times but less than Y times ? (small artists but with a strong community)
  • What are the meetups around me with a group that’s very popular but do a meetup rarely ?
  • What’s the kickstarter campaigns who missed their goals by less than 100$ while trying to raise more than 1000$ ?

I bet everybody ask himself these kind of questions too but dismiss it as “impossible”.

Why is it hard ? Most companies don’t expose a very advanced search tool for their data. What if they exposed their database ? What if you could download the database of every publicly accessible data of each website ?

Here is a first extraction of the dataset: 100k groups and 200k events. I hope to make it easier to ask questions in the future. (via sql for example). I should be as efficient as googling “meetup dogs miami” and getting a list/map of events.

meetup.csv (500mb) (full credits to for running the spider for me)

Here is an example of what you can do with half the data:

The code source: