Self-learning OSINT: #5GCoronavirus. Lessons learned from analyzing social network.

Part 1

IntelTakes
6 min readMay 3, 2020

Seeing many beautiful visualizations of social networks (and inspired by a recent investigation by Benjamin Strick), I decided it is high time to level up in the mighty art of Twitter-Fu and Gephi. In times of COVID19 pandemic, finding a topic was not a hard task. After browsing through the internet, I decided to look into the world of 5G conspiracies.

Galaxies of twitter 5G conspiracies
Galaxies of twitter’s 5G conspiracy theorists.

It’s difficult to say what I was expecting. My main goal was to start learning social network analysis using Gephi as it is one of the essential skills in extensive open-source investigations. I would lie, though, if I didn’t admit that secretly, I was hoping to uncover large Chinese/Russian/Martian bot networks amplifying conspiracy theories to confuse western democratic societies. Unfortunately (or rather, fortunately), it didn’t happen. Instead, Gephi took me on a wild trip to the uncharted waters of the Twitterverse.

Get your data.

Since there is no analysis without graph and no graph widout data, I had to get my dataset. To collect tweets, I have used Twitter Capture and Analysis Toolset (TCAT) from the Digital Methods Initiative. I recommend following their wiki for the installation process (one command). Any Linux based VPS or VM with access to the internet should work without a problem.

To start, you had to define hashtags that TCAT would collect. Luckily, conspiracy theorists tend to add multiple hashtags in every tweet. After 5 min of searching ended up with the following shortlist:

#5gcoronavirus

#5gkills

#5gmindcontrol

#5gtowers

#burn5gtowers

TCAT dev team put quite an effort into explaining how the software works. Every box is thoroughly described. However, in case of uncertainty, I suggest reading much more detailed Dutch Osint Guy article on TCAT and Gephi:

Main TCAT admin panel — here you are building your query.

Using the above-listed hashtags, I was harvesting data for nearly two weeks and has collected more than 5000 unique tweets related to different 5G/COVID conspiracy hashtags.

Finally, export time in the analysis panel. Link to the panel is on the top right corner of the query building page or under the following URL: <IP of your VPS or VM>/analysis/

Adjust the start date properly before export.

While preparing data to export, keep in mind that the default start date is the end date of your dataset. You have to adjust it manually every time after choosing a new dataset.

As befits a professional software, you have a gazillion export options. However, if you are just starting your journey through social network visualization, then “Social graph by mentions” is the only option you should be interested in.

After launching, you will be asked to choose a number of top users from the set. My guess — if you haven’t gathered hundreds of thousands of tweets and have more than 4 gigabytes of RAM on your PC — then to go for 0 (all). You can always later filter results in Gephi.

The art of graph.

Time to get artsy! Playing with data in Gephi feels like a clay modeling class (never tried, but I guess it’s probably it). When you open your exported file, you will start with the square swarm of black dots that gives you no information on your subject.

From this point, many roads can be taken. Below are the steps that worked best for me. There might be many other paths, but I haven’t had time to discover them yet.

Computing modularity. The modularity of a graph is a measure of its strength and describes how easily the graph could be disintegrated into communities, modules, or clusters. Speaking in everyday language, it allows you to identify linked networks in your graph and assign a specific color to them. With my dataset, I haven’t noticed any significant difference when using edge weight, and playing with resolution (increasing resolution by 100% gave me around 5% drop in a number of identified communities).

First step: computing modularity and coloring nodes.

Layout. For social network analysis, two layouts are used most frequently. OpenOrd and Force Atlas 2. To be honest, Force Atlas 2 gave much much easier image to analyze.

Left: OpenOrd | Right: Force Atlas 2

I found out that applying OpenOrd (default settings) and then Force Atlas 2 (tweaked scaling, gravity, and enabled “prevent overlap” mode) on the graph gave the best results.

OpenOrd will run till the end of calculations while Force Atlas will compute continuously, allowing you to play live with the settings. All these tweaks and options will not redefine your graph, uncovering hidden connections. However, it will enable you to navigate more easily or spot smaller networks sitting in the shadow of big hubs.

Changing gravity and scaling
Changing function from linear to logarithmic (or in human language: making more space near hubs)

Keep in mind that there is no ctrl-z in Gephi. Once you run an algorithm, there is no way back. Which brings us to golden rule no. 1:

Use workspaces! Seriously, if you think you have enough workspaces — make a few more, just in case. Workspaces allow you to keep a copy of your work at a particular stage. You can then experiment with different layouts or move nodes around.

Saving raw pre-layout graph to a new workspace.

Make hubs bigger. The last part is to make the hubs pop by linking the size of nodes to no. of mentions. It will allow identifying hubs and important nodes more efficiently.

Linking size of nodes to no. of mentions

Now the fun part. You have a nice graph, so it’s finally time to spend hours digging through it hub by hub, node by node. Zooming, moving, and constantly jumping between Gephi and browser till 5 am.

The next part of my lesson learned will focus on what to look for and how to interpret different parts of your graph. All these based on my findings from the #5GCoronavirus dataset, including Kawai Qanon, Donald Trump’s account in the middle of my graph, true patriotic believers, and other groups testing limits of my tolerance.

I will also share my python script, allowing to automate checking any selected part of the graph with Botometer API.

*Disclaimer: I’m not a professional social network analyst or OSINT guru.
The above article aims to help those interested in learning by showing my findings and comments. If you see any mistakes or know a more efficient way, please comment or reach out on twitter.

--

--