Visualizing Toxicity in Twitter Conversations

Published in

Cortico

11 min readAug 17, 2018

In late June, Deb Roy approached me to ask if I would be interested in doing some visualization work for a presentation he’d be giving at Twitter’s all-hands event with Bridgit Mendler. The talk was to be about toxicity in Twitter conversations, and would showcase the conversation analysis work being done in the lab, particularly around classifying replies as toxic or not.

At this point, any time Deb asks if I want to visua — I stop him right there and say yes. Yes please. When can I get started? (Check out some of his previous visualizations in his fantastic TED talk.)

The project started with an initial design discussion in which we all agreed it would be cool to somehow visualize twitter conversations as natural looking trees, where replies form branches and the more toxic the reply, the more withered the branch would look. At this point, I had no idea how I’d even approach rendering a withered tree, but it sounded like a fun experiment so I said I’d look into it and do my best.

One of the benefits of working at Cortico in the MIT Media Lab is that a ton of interesting people are always coming through. As luck would have it, a few days after our meeting I was showing seasoned visual FX artist Eugénie von Tunzelmann around the lab. She was intrigued by the idea of the conversation trees and thought I might enjoy using Houdini to try and model it, suggesting its procedural nature would appeal to my engineering background. A day after our meeting she was even so kind as to send over a sample file for an approach that might work.

Eugénie’s sample render of a partially withered tree

It looked pretty sweet and I was convinced it would be fun to learn Houdini for this project. My naive past self was exceptionally optimistic about this despite the deadline for the project being just a few weeks away. Thankfully MIT offers access to Lynda.com courses and there were a couple on Houdini that I completed to get a crude, basic understanding of the application.

With Eugénie’s sample file, Lynda.com’s courses, and the online Houdini help pages at the ready, I began my journey modeling the toxicity of conversations on Twitter. I broke it down into three steps to ease my anxiety:

Figure out how to layout the tree
Figure out how to render a tree in Houdini based on real data
Create a video showing multiple conversation trees growing

Lay out the Conversation Tree

At Cortico and the Lab for Social Machines (our group in the Media Lab), it’s pretty common to spend time thinking about, looking at, or creating network graphs. Every time we’ve rendered graphs in 3D however, we’ve used a force-directed layout. Given that this data had a bit more structure (it’s a tree), I thought I’d look and find if there were any other interesting 3D layouts to try.

The first thing I found was a paper from 1995 by an old professor of mine, Tamara Munzner! Sweet. But it was about hyperbolic spaces, which was a bit beyond what I was interested in. However, there was a great figure demonstrating a 3D cone layout that looked very promising. Thanks Tamara!

Tree Cone Layout (figure taken from Tamara’s paper)

Next I had to figure out how to apply the layout to our data. I found a very extensive Python graphing library called Tulip that had algorithms for laying out graphs in dozens of ways, including the cone layout. Jackpot. With a little finesse I was able to take our conversation tree data and output JSON files that included the nodes, links, 3D positions, and toxicity scores. With these pre-computed files, all I’d need to do was get Houdini to render objects in their positions as specified in the data.

My first approach was to bang my head on the wall several times, but it turned out that a more effective means of moving forward was reading documentation and trying things in Houdini. Houdini has great Python integration, so I was able to write a bit of code that generated geometry (points and lines) based on the data. Thanks to Eugénie’s example, I was able to figure out how to use the toxicity parameter in the data to color the nodes. I really owe her a beer… or five hundred.

I didn’t want to settle on the cone layout without first trying a few others, so I generated several different JSON files with the layouts and began to explore how they’d look. Here green corresponded to “healthy” and red to “toxic”. Each tweet was represented as a sphere, and a line was drawn between tweets to indicate that one had replied to the other.

Various layouts attempted for rendering the conversation trees

I brought these screenshots back to the team to get their input, and we decided that we should move forward with the cone layout turned upside-down so it looked like it was growing out of the ground.

Visually model toxicity

With a layout settled on, it was time to figure out how to actually make it look cool. The main goal was to represent toxicity as a withered, dead part of the plant. Knowing next to nothing about Houdini, I went blindly by the names of the operators. The first two I tried were “mountain” and “point jitter”.

Applying mountain to the edges and point jitter to the nodes

Unsurprisingly, something wasn’t quite right, so I began considering alternate possibilities: maybe we could try something less organic. What if the “trees” were made of metal and glowed? Sounded cool to me, but mostly I think I was just drunk on the idea of being able to easily add materials to my geometry.

Seemed a bit too much like Christmas tree ornaments for my liking. (What was I even thinking, a metal tree?) At this point, I went back to the basics and focused on showing a withered plant. Step 1 was to invert the cone, color the edges by toxicity using a more natural color, and try and use a small amount of jitter on the toxic edges to make them look a bit more degenerate.

Trust me, that trunk is structurally sound.

I felt like a real genius when I decided to add weird fruit to represent the tweets, but on closer inspection, I was still just myself.

An epiphany struck when I realized I was going to want to animate these plants growing into the scene. I had no idea how to do this, but I stumbled on to a very useful tutorial that was similar enough. My current geometry that just used individual edges wasn’t going to work with the approach shown there — it required longer lines, not disparate edges. I modified my geometry to consist of lines from the root tweet to each of the leaves in the Python code and was very pleased with the results, which had a more organic look.

After I switched to using longer lines, the model looked a bit more natural

My personal favorite — the party tree! Toxicity encoded as a rainbow.

As I learned more about how the Houdini operators worked, I was able to make something a bit more respectable. I was feeling pretty good, it was starting to look a bit like a withered plant if you squinted. And I was even able to animate it growing!

Things finally started to come together!

Not wanting to be strictly productive, I got a bit cheeky and thought I’d give a nod to my Canadian roots by adding in maple leaves to represent the tweets themselves. (Hey, it’s better than weirdo fruit, right?) Unfortunately I realized the manipulations I applied to the edges to make it look more organic had caused them to drift away from their original node positions, so some leaves floated orphaned in mid air. Oops! I’m sure there should be some way to solve this, but I couldn’t figure it out for the life of me. Instead I adjusted the way I jittered the edges by tapering to maintain the start and end points, resulting in nodes being connected.

I tried using maple leaves to represent the nodes, but oops! They no longer connected.

To my complete surprise, the rest of the team was less enthusiastic about the leaves, so I returned to just using simple sphere as the nodes. Around this time I decided I’d add in a little wavy trunk because it just felt right. The longer the reply chains were, the bigger trunk was.

It was time to try and duplicate this approach for the other twitter conversations we wanted to visualize. Given the procedural nature of Houdini, this turned out to be pretty straightforward (praise be, digital assets, praise be). However, upon reviewing the group of conversations together, we decided that the “natural” withered look wasn’t visually distinguishable enough from a distance.

Three different twitter conversations with varying toxicity and depth

I tried several different color variations, but settled on a bright orange as the signifier of toxicity. Turns out half-way between green and orange is vomit, so that worked out pretty well. The colors came out a bit subdued in the renders, but I was hoping I’d be able to brighten them up in the compositing phase (something I had just learned existed).

With less natural colors, the differences between the trees are more obvious… right? Ok, I know it’s pretty dark.

You’ll notice the trees also changed shape in the above screenshot. We decided to resample the data used in the trees so they were more representative of the relative size of the actual twitter conversations. Sampling was necessary due to the extreme volume of single replies to the original tweets that made comparing differences more challenging in the visualization.

Lights, Camera, Action!

We were running out of time and the modeling was good enough, so it was time to try and create the video that followed the narrative of the presentation. The team had selected tweets and replies that we wanted to focus on and it was my job to navigate the camera around the scene in coordination with the growth of the plants to match what they wanted to talk about.

Having never really animated a camera moving around or really thought much about directing a movie outside of just taking casual photographs, this turned out to be pretty challenging. Kudos to directors everywhere. There were mystical comments on the internet about rigging up null nodes to the camera and animating them to make it work, but it was mostly gibberish to me. I persevered and did what I could, but boy iterating on these things was really slow.

It turns out that rendering with a CPU-based ray tracer takes its sweet time, and I was doing just that. The final result had 528 frames in it, resulting in a 17 second long video. (Yes, I did think to myself: all this work just for 17 seconds, have I lost my mind?) I had no idea it would take over 24 hours to render all the frames! This meant I settled a bit more than I would’ve liked since iterating took so long and I didn’t want my poor laptop to melt.

I brightened up the colors and superimposed the tweets in After Effects, and voilà! The video was complete.

The End Result

Here’s the final video, brightened up and with tweets superimposed!

The 17s raw video is available here.

Crowd Feedback

When the event finally took place, Twitter was tweeting about it with #OneTeam. I was able to find some tweets and see photographs people were taking from the audience. At least a few people seemed to enjoy the vis, so I was pleased!

Wrapping Up

In the end, it was a lot of work, but a ton of fun learning all the new technologies I needed to to make this unique visualization. The team was pleased with the result and it seemed to support our message conveying the variety of conversations taking place on twitter in terms of depth and toxicity. Perhaps next time I’ll look into trying a GPU renderer like Unreal, Unity, or Redshift to speed up the process!

Work at Cortico

If you like working on exciting, creative projects, enjoy things like machine learning, natural language processing and data visualization, and want to help make the public sphere a bit healthier, come work with us at Cortico! We’re hiring several positions and would love to hear from you.

https://www.cortico.ai/careers

Thanks for reading! If you have any questions or comments, feel free to reach out to me here or on twitter @pbesh.