Sex in Word Clouds: visualizing conferences in real time

No engineers were harmed in the making of this post.

Every year, WSO2 (which I work for) hosts at least one major conference called WSO2Con. Often ‘Con’, as it’s referred to internally, is three large events: one in the US, one in Europe, and Asia-con in Sri Lanka. By and large, it’s attended by engineers who use our products.

At AsiaCon this year, there was a very small difference — a screen mounted near the lunch area. Occasionally, clumps of people would form around the screen and watch words and lists and graphs sketch themselves across the white space.

Dr Beshan Kulapala, one of the minds behind the Sri Lankan supercar project, started speaking: “VEGA” and “Codegen” and “Kulapala” started growing larger and larger on the screen. Mifan Careem said “Programming is like sex … because one mistake and you have to support it for the rest of your life” … and sex appeared on the big screen (a bit of embarrassment there). Every so often, a speech would end, and the screen would switch, showing the number of times the speaker had mentioned a certain topic.

This was the Twitter Thing. For lack of a better word, let’s call it the WSO2 Conference Monitor.

What we did

Not so long ago, in a forgotten corner, a bunch of us were quietly hammering away under the direction of Srinath Perera. Our task was to build a dashboard analyzing the web activity around the US Election, and, in the process, build up a case study around two of WSO2's data analysis products. It was an internship project.

A month before Con, we had a fairly decent prototype to take up to Sanjiva Weerawarana. It looked haphazard (“Totally random UI placement, Yudha!”) but it worked: it fetched tweets, analyzed what people were talking about (wordclouds), mapped Twitter communities, even judged the sentiment of media articles on the web. It confirmed, with hard data, something we knew by instinct: Donald Trump was owning Twitter, but everyone hated him.

Sanjiva threw a curveball: deploy it for Con.

Fine. Firstly, we threw out everything we couldn’t use. Out went the sentiment analysis. Out went the bar charts for politicians. In came a new interface: bubbles, lists, bright colors.

We were using WSO2 ESB and the Twitter API connector to pull all the conversations happening on Twitter around the election: we tuned this for WSO2Con — any tweet that so much as mentioned WSO2 was pulled, along with who sent it, timestamps, retweets, favourites, the lot. This data was then thrown at Data Analytics Server (DAS).

DAS is where most of the magic happens. Much of the old logic carried over: firstly, DAS would read the last ten tweets. Then it would generate a wordcloud using Vega.js, updating it every 30 seconds. The wordcloud becomes this fascinating collage of what people are talking about at the moment. Case in point:

DAS also generated a community graph; using the D3.js library, we mapped out the tweeps from all that data we gathered. Each node (circle) in the graph is a Twitter account. The size of a node corresponds to the amount of tweets it’s pushed out; the number of connections it has with surrounding nodes indicate retweets. This gives us a graph where the content producers are large and retweeters appear very well connected, like spiders in a web.

The more we looked at it, the more we realized that most of what we had built for the election could be adapted.

Previously, we had a function where incoming tweets were sorted by the politicians they were referring to, forming a rough ‘attention map’ showing just who commanded the most amount of interest from the Twitter community.

This wasn’t that far off from topical analysis. Think about it. Incoming tweets can be sorted into different categories based on the keywords that they use. Politicians, subjects, what’s the difference? After a little bit of remapping, we replaced the politicians with the topics that people talked about:

An interesting note was that the majority of people were tweeting not about the subjects covered, but about the conference itself. People would say things like “#wso2con2016 is awesome!” and that would end up being sent to the wso2Con category.

But we also wanted to see what tweets were really influencing opinion at a given time, so we had an instance of Complex Event Processor running. Complex Event Processing, or CEP, is a field which deals with taking tons of real-time events and carrying out operations on them. Our use was fairly lightweight by the usual standards: we were simply constantly ranking tweets, HackerNews-style: sort them by popularity (retweets + favourites) and then by freshness, making them drop in rank as the tweet got older and older.

The result? A ranking system where only the newest and the most popular tweets stood out. Of course, this works better for an election than for a conference, but this screen, once it was running, was displayed first, because it gave a clear idea of tweets being collected and processed.

And while the actual minutae behind the whole thing is a longer discussion, it turned out to be a miniature success — people talked about it, around it, tweeted at it, and one guy, in an effort to make it to the top of the chart, tweeted a record 467 tweets in the two days that it was live.

We gave him a GoPro for that.

That’s what she said…

Somewhere in the middle of the original conversation, Sanjiva pulled out his Android. “Look guys,” he said, brandishing it. “If this phone can recognize voice and react to it, why can’t we? Can’t we find a way to analyze what we say at Con as well?”

This was one thing completely outside of our initial capacity, so into the fray was called the IoT team. Hmm, said Sumedha Rubasinghe, who was heading the team. There’s a way to do this.

So while we hammered away at converting our scrappy prototype to something that we could display, the IoT team explored Google’s Android speech recognition libraries. An Android device would listen to the speeches, delivering mentions of certain keywords to DAS. DAS collects these mentions and sort them by topic. When the speech ended, it would trigger a new slide being briefly displayed — showing the speaker, their bio, and what they were just talking about.

Here’s a UI mockup

In practice, each of these slides switched in and out on a single large screen, running in a full screen browser window. Because it was designed for large screens and for people viewing it at a distance, each slide contained one, at most two pieces of analysis, and only if they were closely related. The speech stats — we called it a beta, because of all the variables — would show up only at the end of a speech, and the entire set would have cycled in by the time you’d finished reading this.

POSTMORTEM

Of course, not everything is as easy or as simple as a blogpost makes it sound. The conference experience was excellent; the moment we tried to put everything together, it all fell apart. And then it fell apart again. And again. This showed us all the holes in our implementation, so by the time we took it to Con, we’d already faced the worst that could happen. When this goes live for the election, it’ll be ready. Both DAS and CEP are built to handle massive scale; it was simply a matter of getting our bugs ironed out.

If there was one takeaway from the whole experience it’s that word clouds fascinate people above all else. We’d bet on the tweet ranks being something that would draw the eye, or the topical representation. We spent a great deal of effort tuning the community graphs and the bubble charts. But no; more people were drawn to the word clouds than to anything else.

Going forward, the plan is to clean up all the overnight hacks we did and turn the Twitter Thing into an app that can be deployed at any of our conferences — no fuss, just a couple of clicks and a hashtag. And, if we can do that, to make it open-source as well. After all, there’s plenty of conferences out there, right?

--

--

Data scientist, public policy and tech, @LIRNEasia. Nebula Award nominated author. Numbercaste (2017) / the Inhuman Race (2018). @yudhanjaya on Twitter.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yudhanjaya Wijeratne

Data scientist, public policy and tech, @LIRNEasia. Nebula Award nominated author. Numbercaste (2017) / the Inhuman Race (2018). @yudhanjaya on Twitter.