Answering the call
- build your software engineering skills
- introduce new ways of thinking about problems using data science
- give you fresh ideas for your portfolio
These open-format challenges provide guidance and boundaries to narrow the scope of your work, giving you a project small enough to finish in a few hours, but big enough to challenge your skills and maybe let you show off just a bit.
So, on to the first DSC challenge… Exploring the Meetup API in the city of my choice!
How I found the best Meetups in San Francisco
I dove right in and tried the first challenge for myself. I wanted to learn something useful, something I wouldn’t get from just surfing through the Meetup app. Like almost every other user of Meetup, I knew what the website and app had to offer. A basic search, maybe some recommendations, but nothing of any real substance.
In fact, using Meetup on Android or iOS resembles using a dating app. You might get to see a few pictures, try to get a sense of whether your prospect is fun and interesting, roll the dice, and hope that it’s a worthwhile use of your time. Sometimes it’s good, but many times, it’s quite the opposite.
What I didn’t know is that Meetup provides a really rich API for finding out way more than what you normally see from those two interfaces. I just had to dig into the API documentation and find a way to start!
Meeting the Meetup API and setting a few goals
I decided I could do better, but the clock was ticking. I Googled the Meetup API, found a Python client to work with, and thought about my biggest pain points when I use Meetup. I wanted to find just the right Meetup group, but I didn’t want to browse through every single tech Meetup in San Francisco first. I wanted it to be worth it. I wanted to find:
- a high-tech meetup in or near San Francisco
- something that’s trending up; a meetup that is winning people over and is becoming more popular and successful
- a group that meets up frequently, at least once a month, possibly more
- a really big group with lots of members; I need to maximize my chances to find other data scientists to network with
Aha, my goals were set! Now let’s grab some tools and get to work.
Starting my Jupyter Notebook
It all starts with a blank notebook.
When it comes to data science, half of the fun is telling the story. Source code alone makes for painfully dull reading. Instead of writing source code, I wanted to journal my exploration with a Jupyter Notebook.
In case you haven’t used them, Jupyter notebooks are brilliant, interactive canvases. They allow you to paint a picture of your data science journey, complete with cells that combine to form a mosaic of prose, source code, mathematical formula, visualizations, and even embedded video clips!
In fact, if you want to see the science behind this story, just take a look at my Jupyter Notebook for a behind-the-scenes look at how I systematically searched for and found the very best San Francisco Meetup!
I skimmed the Meetup API to learn that groups are placed into 33 different categories. I also saw that members, events, and groups are placed into cities. I also saw that the API returns data 200 records at a time, in pages.
At first, I just wanted to know how many tech groups met in or near San Francisco. Meetup proceeded to tell me that there are 4 different locations called San Francisco in the US. Odd, I asked the API for just the cities called San Francisco that are also in California. After narrowing it to just a single city, the API finally told me that there were 2,200 tech groups that meet up in San Francisco, CA.
That’s a good start, I’ve gone from 225,000 groups to just 2,200. But I still don’t know which of those groups are the best. Maybe I could visualize the number of members each group has in a histogram.
Okay, so there are thousands of tech groups with smaller memberships that I could choose to exclude, but which ones were the biggest? As I stated in my goals, I needed a “really big group with lots of members”.
Box and Whisker Plots
I used a box and whisker plot to help visualize the distribution of tech meetups according to their individual membership sizes and discovered the following, there are mega-groups for tech in San Francisco!
As you can see, the mega-groups, shown as little black circles on the plot, can be quite a bit larger than the regular-sized groups, which are shown at the bottom of the graph.
In fact, I noticed that 75% of the groups have only 780 or fewer members. It’s safe to say that any group that’s at least ten times bigger than that deserves special attention! Our top 10 mega-groups each have over 11,000 members.
The top 10 mega-groups
I wanted the biggest groups; so here are the giants. Even the smallest among them is an incredible outlier!
Searching through the 10 biggest tech groups
So now we’ve gone from:
- 225,000 groups everywhere to
- 2,200 tech groups in San Francisco to the
- 10 biggest tech mega-groups in San Francisco
But we’re not done, I still have two more goals. I want a group that:
- Meets frequently (at least once a month)
- is trending up, or at least maintaining its popularity
Next, I decided to collect all of the events thrown by our mega-groups in the past nine months, I used a Seaborn LMPlot, (Linear regression Model Plot) to visualize the top ten mega-groups.
Even though 10 groups are still too many to properly visualize, I was able to see that a few groups were getting very few yes-RSVPs. I also noticed that others weren’t meeting at least once a month. I decided to eliminate 6 more groups.
The Top 4 Tech Groups in San Francisco
Now we were down to the final four groups and although each one meets the criteria for being an awesome group, only one deserves the title of Best Tech Group in San Francisco.
Once again I used Seaborn’s LMPlot to visualize the final four mega-groups. In this plot, we see individual events as colored dots, trends as lines, and confidence intervals of 68% (one standard deviation) as translucent bars surrounding each trend line. Let’s use our goals from the top of this article to judge the final four groups and rank the results.
Bronze Medal Winners
* Met up 31 times in the past 9 months (that’s almost weekly)
* Smallest variance in the number of yes-RSVPs per meetup, perhaps indicating a loyal following
* Smallest number of average yes-RSVPs per meetup (still over 37)
* According to the trend line, it’s definitely gaining popularity with an average of 41 yes-RSVPs
* Meets up the least frequently of the final four groups, averaging about once a month
Silver Medal Winner
* Holding on to a nearly flat trend of about 130 yes-RSVPs per meeting
* Met 31 times in the past 6 months (practically every week)
* The largest Tech group in San Francisco at 36,057 members
* Its trendline is gently decreasing, so its high popularity may have peaked
Gold Medal Winner
* The largest mean number of yes-RSVPs per meeting at 218
* The slope for the trendline indicates that it’s growing in popularity
* The confidence interval shows a really wide standard deviation, which means that the yes-RSVPs vary quite a bit more than the other groups. You should check each event in advance so that you’re not surprised.
As a data scientist, it was fun to use data science to find the best Meetup groups. Also, please take a look at my Jupyter Notebook and let me know if you like it, hate it, or disagree in the comments below. Don’t forget to clap and share if you liked the article!
So how do I do the Data Science Challenges thingy?
Oh, you’re still reading? Well, if you are familiar with Python, Github, Jupyter, and statistics, my friend Johannes Giorgis has published an excellent introduction that will help you get started.