How We Scraped Data off a Slack Group

Bolaji
3 min readJun 15, 2017

--

I’m very excited to share my experience so far since in Data Science. It’s been nothing short of amazing and big ups to Nora Studholme who is the inspiration behind this project. She’s such an amazing person and I enjoyed reading her most recent article — It’s the Championship.

Image Credit: skilledup.com

This is not a tutorial per se, it’s just me sharing my experience.

I was done with Simulations a month ago and I got bored, I wanted to venture into something new while leveling up on some new technologies. Then came Nora — with an idea for a Data Science project which would ultimately help her build out a data science curriculum (one of her current projects). Interesting! — that’s what I thought. It turned out to be a Data Scraping project.

What is Data Scraping?

Data scraping is a technique in which a computer program extracts data from human-readable output coming from another program. — Wikipedia

I signed up for this and I did some research on tools used for Data Scraping, I started out with a blog post — interesting enough it was a post about how a data scientist created a “slackbot” to scrape data off slack to find an apartment in San Francisco (you can read about it here).

I worked with a team of 9 devs (developers) with little or no prior experience in the Data Science. Greatly influenced by one of the guests we had on our bi-monthly Data Science meetup — Allen Adekunle, we set out to mine raw data that we would refine to get the information we seek.
Some of us were fairly new to Python (the language) and had to brush up, then we had a meeting to discuss tools and the information and the information we’d hope to extract.

We were able to come up with several tools such as:

We studied the documentation and how to use these tools to get information such as:

  • The most popular reaction used in the slack group
  • The message with the most reactions
  • Most active stack (e.g java, javascript, python, ruby)
  • The time when users are most active
  • Who the most active users are
  • The user with the most reactions

Slack Client was used to access Slack’s API and that was how we were able to get the channel list, user list, and message history. We made use of jupyter notebooks to work as it is a very good tool used by data scientist — with Jupyter notebooks, you write code in an interactive cell block instead of creating a new file.

Below are screenshots of some of the questions listed above that we were able to answer via data visualization.

wordcloud
most frequent words used in the slack group
date & time when users were active the most
popular reactions
messages with the most reaction

This is what we’ve been able to come up with so far for our first sprint. We plan on answering more questions in subsequent sprints.

A lot of credit goes to the developers who worked on this project with me: Inumidun Amao, Babalola Rotimi, Yaasky, Stanley Ndagi, Aladeusi Olawale A.

--

--