The Social Media Macroscope (SMM): A New Open Environment for Research using Social Media Data

Joseph T. Yun
Mar 18, 2019 · 5 min read

Don’t bury the lead:

  • The SMM is a science gateway to enable research using social media data without the need for programming skills.
  • All algorithms in the SMM are transparent in their methodology (non-black-box) and the SMM is FREE for academic and educational use.
  • All tools in the SMM are (and will be) open-source.
  • We believe that our environment will be extremely valuable even for commercial use, and our platform will be accessible at a far lower cost for commercial use than what is current out in the market.
  • You can immediately start using the SMM for academic and educational use; if you are a commercial entity, you can immediately test out the fully functioning system right now at www.socialmediamacroscope.org. If you are going to use it for commercial use after testing it out, please contact us at otm@illinois.edu.

When I was going through my doctorate program (Informatics with a focus on Computational Advertising), I noticed a very common situation in academia. The situation was that there was not enough being done to bridge the gap between researchers that created computational algorithms and researchers that could apply those algorithms to social research. My background was a computer science bachelors and an advertising masters (with a primary focus and interest in social psychology), thus I found myself in a lot of conversations that went something like these two scenarios:

Speaking with a Social Researcher

Social Researcher: I have this research question I am trying to solve and I want to use social media to try to solve that research question, but I think it will take many years to create an algorithm to do such and such…

Me: Oh, did you know that so and so researcher on our very campus has already created an algorithm for that?

Social Researcher: Really! That’s great. How do I use that algorithm for my research?

Me: Well, all you have to do it read this CS paper, go to this GitHub, take the code that isn’t fully finished yet, branch it, edit it, etc. etc. etc.

We stare at each other, conversation generally fades off, and we bid each other a respectful and kind farewell.

Speaking with a Computer Science Researcher

CS Researcher: I have this awesome algorithm that detects such and such on social media data! It has achieved X accuracy and the F1 score is Y and we have won Z competition!

Me: Awesome! Out of curiosity, how do you know when your algorithm is good enough to be used to answer applied social science research questions? I’m positive there are many researchers out there that could really use this kind of algorithm if we could just test to see if you are at a point where the model is directly useful for applied social science research!

CS Researcher: This sounds exciting! How do we test this?

Me: Well, if you could potentially take your code, harden it a bit, document it, and put it up on the web with a GUI interface that a non-programmer could use, I think we could really test it!

We stare at each other, conversation generally fades off, and we bid each other a respectful and kind farewell.

Okay folks, there it is. We had a predicament and I wanted to solve it. I found a stellar team and was mentored/sponsored by fantastic folks such as Mark Henderson, CIO of University of Illinois at Urbana Champaign, and John Towns, PI of the NSF XSEDE (Extreme Science and Engineering Discovery Environment) project.

Thus, I would like to introduce you all to the Social Media Macroscope, A Science Gateway for Research Using Social Media Data (www.socialmediamacroscope.org).

I could go extensively into the details of the site, but in efforts to keep this post simple and short, I will just say that this is my attempt to connect the two types of researchers as mentioned above (and a third type that will be further explained). Full details of the SMM can be found at this pre-print.

It may be helpful to call out three types of researchers by the next three graphics to show the value of the SMM for each type of researcher.

For the researcher that is primarily focused on applying computational methods to answer an applied social research question, the SMM provides an easy to use interface (no programming required) with algorithms that are fully documented and non-black-box (tied to CS academic research papers).
For the researcher that is primarily focused on building computational algorithms, the SMM provides a landing place besides a GitHub in which social researchers can test the validity of the algorithm directly to their social research questions. The SMM also is building up connections as well as stores of social media data (in line with Terms of Services for each social media platform) to give computational researchers more data to build their algorithms against.
For researchers within industry companies (such as marketing/advertising professionals), the plethora of social media analytics platforms that have algorithms that are black box are not really helpful for providing confidence in the accuracy of their outputs. Additionally, those platforms are usually quite expensive to use. The SMM houses fully transparent algorithms and our initial goal is cost recovery for commercial use.

I want to call out the commercial use point specifically since I have not mentioned that thus far in this story. There are many social media analytics platforms in the marketplace (e.g., Sysomos, Radian6, Brandwatch, Crimson Hexagon), but they normally house algorithms that are trade secrets and black-box. A growing problem in data science is when things are black-box, you cannot have confidence in what they are actually outputting. What if you are using a sentiment analysis engine built on Twitter data, but you are trying to apply it to Instagram posts? It doesn’t take a data scientist to sniff out the fact that this is probably a really bad idea. Although all the tools in the SMM are open-source, we have a running instance of it provided via HUBzero and using specific AWS cloud services so that if you want to use the SMM for commercial use, you can definitely contact us at otm@illinois.edu. You can also contact myself (Joe Yun) with any questions.

My last point is a clarifying point. You may be asking, “What exactly is the SMM? I kind of understand the motivation, but what is it in reality?” Well, my answer is simply, “It’s kind of like an app store for open-source social media analytics tools where you actually can run the app within the store.” My team, the SRTI Lab, has built two “apps” into the SMM. The two apps we have built are SMILE (Social Media Intelligence and Learning Environment) and BAE (Brand Analytics Environment). We welcome more apps and algorithms into our environment!

SMILE (Social Media Intelligence and Learning Environment) is kind of like your general all purpose social media analytics platform (akin to Radian6 and others), except ours is fully transparent in its methodologies.
BAE (Brand Analytics Environment) was our attempt to create a study (and a tool) in which we start looking at whether or not machine-learned personality detection via social media data actually affects behavior. We found a bit of evidence for this to be true and our study can be found at T&F or a free pre-print at ResearchGate.

Well folks, that’s a bit of my story and the full story of the Social Media Macroscope. Please check it out and happy researching!!!

Joseph T. Yun

Written by

I am a Research Assistant Professor at the University of Illinois at Urbana-Champaign that focuses on computational business, data science, and analytics.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade