Introductory guide to Open Source media network analysis for beginners

“The network structure is so dense that any misinformation spreads almost instantaneously within one group, and so segregated that it does not reach the other.” Filippo Menczer

This guide will show you some basic easy beginner techniques for identifying covert disinformation sites and seeing their network connections.

The methods I recommend rely primarily on social network metadata (data about data), rather than information content or themes. We’re much less likely to fall into confirmation bias, for or against, if we rely primarily on objective metadata about the networks behind information or disinformation sources than if we try to analyse their contents first.

Before the methods for checking sources, I’ve tried to briefly explain what ‘disinformation’ means, that it’s not as simple as the term ‘fake news’ implies, and to introduce some of the psychology needed to understand how we’re being manipulated. Learning (1) how to check sources and view the network as a network, so you know really who’s telling you what, and (2) learning psychologically about how we’re being manipulated, are probably the most efficient resistance strategies we can do individually. Ultimately, I think we need to change the structure of our media environment so it matches the environment our cognitive mechanisms are evolved for more closely again, but that would take another article to explain and would take probably 10–15 years to carry out and it requires collective action, which is too slow for now.

I have actually surprised myself before with both finding sites to be connected which I didn’t expect were and finding that sites which I did think were connected I couldn’t find any confirmation in the data — this is a good sign the methodology is robust if it doesn’t always confirm our expectations.

What exactly is ‘disinformation’?

Disinformation is not simply “fake news,” although ‘fake news’ is now commonly being used by the media as if it was an adequate synonym.

‘Fake news’ implies misinformation — simply false information, but disinformation is a strategic mixture of truths, half-truths and lies used to manipulate political behaviour towards desired aims. Some total blatant bullshit can also be used strategically to shift norms of political discourse, but almost all disinformation uses some factual truths as starting materials.

Disinformation is a technique of information-psychological warfare, part of global hybrid warfare, i.e. the integrated combination of information-psychological means of warfare and kinetic (physical and explosive) means of warfare. Secondarily some of it involved private profit motives, but primarily and mainly it’s used for political aims.

Disinformation is built up in layers — the first ingredient is always some truth, even if it’s just an apparently similar event, long ago, with no real causal connection or logical relevance to the new events being reported or used to create propaganda, but always something that will be generally perceived as absolutely obviously true is used as an anchoring point, then reframe it using moral-emotional framing terms to activate the right set of moral-political emotions in the target audience to lead them towards supporting the conclusion they want, reframe the previous, true events into a big geopolitical narrative — typically an extremely reductive narrative or theory which claims to explain everything in a simple totalizing story of everything, possibly add in some simply totally false ‘facts’ or some very misleadingly exaggerated or miscontextualized real facts, and then close with the conclusion aimed for.

Disinformation strategy relies on some facts about human cognitive psychology which we usually prefer to not notice —

We very rarely do the slow, individually independent, conscious kind of thinking to make our decisions, almost all our decisions are made using social cognitive heuristics — quick and efficient approximating mechanisms, for example — the anchoring and adjustment heuristic, the majority threshold heuristic, which are the most often manipulated in propaganda strategies.

Anchoring and adjustment heuristic means that we often make decisions on complex issues, when we can’t gather enough information to really independently and logically decide for ourselves, by deciding 1) is the anchoring point of the claim or argument probably true?, 2) does the adjustment or distance between my old perception (the anchor) and the new perception which the argument is claiming to be true seem reasonable or credible? Anchoring and adjustment heuristic is very efficient and adaptive — without it we probably wouldn’t survive, but it can easily be manipulated.

‘Bandwagon’ or what I prefer to call threshold heuristic is also basically highly adaptive and that’s why I don’t like the name ‘bandwagon’ because that implies a negative judgement on it before even describing what it means. Imagine a situation in the human ancestral environment or the environment of evolutionary adaptedness, living in hunter-gatherer semi-nomadic groups of 30–150 people, frequently in inter-tribal feuds with other groups, usually over women and paternity rights or farming or hunting territories, then a pair of men return to the village saying they’ve seen the other tribe approaching, and another, and then another three come and say the same. At some point, the balance of risks and costs of deciding to believe them without really knowing whether it’s true means that it’s adaptive to err on the side of caution and prepare for an attack. Manipulations of bandwagon or threshold heuristic very often evoke fear first because that makes people much more susceptible to this manipulation tactic. Then recollect a situation when you’ve seen comment wars underneath a media source’s post on social media — in that environment, naive people who aren’t more familiar with the topic than what they’ve just read or watched in the news article/ video posted usually don’t know much about who is bloc commenting and where they come from socially and politically. In the ancestral environment, we would have known a lot of intimate details about who was telling us what, and our filtering mechanisms would have been able to work out most of the time who was credible, and make those efficient survival decisions in conflict situations.

I’m trying to not make this section too long now, but the two paragraphs above also indicate briefly the evolutionary mismatches in the structure of the information environment, i) on social media platforms as they’re designed now, ii) the structure of the underlying social graph algorithms which select, rank and recommend what we see, e.g. EdgeRank, iii) the structure of the information environment which our social cognitive heuristics are still mostly adapted to, because our social environment has changed much faster than genetic evolution of our social psychological traits can catch up.

A heuristic in a mismatching environment becomes a systematic bias.

Disinformation is a complicated mixture of truths and lies, so dissecting and unpicking how it was woven together takes far more time than producing more of it. Volume, repetition and recirculation is how they are winning. Debunking it bit by bit is far too slow and costly to even keep up.

Even though there is almost always some truth used in the making of disinformation, the contextual and emotional reframing makes it appear to mean something very different. This is one of three reasons why a piecemeal approach to debunking disinfo cannot work efficiently enough.


Framing means the terminology used to characterise and categorise events or people which evokes certain moral emotions, without us necessarily being conscious of the process.

It’s probably impossible to introspectively observe directly how framing language evokes a moral-emotional cognitive response in us. We can only ‘see’ the output not the process itself directly. The cognitive ‘modules’ influenced by framing language (or more broadly and accurately, all forms of symbolism) probably evolved long before language, and occur prior to conscious thought, but experimental psychologists have inferred some indirect observations and general principles from how people respond to framing stimuli.

What people actually remember most and react to is the moral-emotional framing, not nearly so much its factual or apparently ‘factual’ contents.

That’s why even if you debunk the fake ‘factual’ content and someone consciously agreed the day before, they’ll often unconsciously revert to repeating the same disinformation the next day because it felt more convincing than reality. Disinfo feels hyper-real. This is reason two of three why piecemeal debunking cannot work adequately or fast enough.

To change that, you have to reframe it so that they feel significantly and memorably different about the topic and then next time they’ll recall the factual details which made them feel differently. I think one of the best ways to do that is to point out the frame itself and how it is manipulative. That might even be the primary adaptive function of the sense of betrayal.

Checking photos and videos

Finding that the main fact claims in a report correspond with some real events somewhere, sometime, by someone, does not necessarily mean that it is not disinformation — what happened and what was reported may have happened in a different place and time, even if there’s photos or video used. Video is such a powerful communication medium that if the factual contents in a video are reframed within a false context to create a false meaning it has an enormous and persistent effect on convincing people that bullshit is true.

For example, they quite often recycle photos of an old case of child organ harvesting and human trafficking but misrepresented as if it was recent and in a different place and context. The who did what to whom, where and when, will be deliberately mixed up or falsely equivocated or ambiguified, or they re-use photos from autopsies on car crash victims misrepresented as evidence of organ trafficking, in order to smear the nationality, race or religious identity which the traffickers were falsely attributed with. Paedophilia is another highly emotive topic they often use in disinfo smear campaigns — it’s often used as a thinly veiled homophobic slur or precursor.

If you’re investigating a post with a photo you suspect is being misrepresented in another context to make it appear to mean something else, you can do a google reverse image search — in Chrome just right-click on the photo and select “Search google for this image” — it’s possible in other browsers but just takes a few more clicks.

Better than Google reverse image search but a tiny bit more effort, is —

For videos on YouTube, there is this reverse search tool —

Unfortunately this tool only works for videos on YouTube, so far, but sometimes you can find the same video on YouTube and then at least show its network on there.

Sometimes they get complacent about how easy it is to fool the people who want to be fooled because it’s more comfortable and seems to absolve them of any human responsibility, and then they do something like this:

They re-used photos from a school they bombed in Idlib as if the photos showed a school in West Aleppo they claimed was bombed by rebels, but they forgot to change the AFP (Agence France Press) pop-up tag on the photos they nicked from the internet, so the description of the original context was still there. Sloppy.

Also google the name of the supposed author. Are they in a position to possibly have done the kind of investigative research journalism which they claim to rely on in their article? If not, where do the fact claims come from?

Four Simple Steps to Identify Covert Disinfo Sites:

  1. Simply google “site:http://www.[name of site].com russia”, or substitute “russia” with any other search term, and just see what you get. Often, people get lured in by an innocuous or emotionally appealing clickbait post and don’t know what else a site or Fb page is sharing, and wouldn’t have chosen to interact with/ promote that site or page through the network long-term if they’d known. I often also do “site:www.[nameofsite].com syria”.
  2. Google for the name of the site or the title of the post or the author’s name on one of the major Russian trolls’ aggregator sites, e.g. “ “Alternative News Network”” You can also just scroll down that aggregator and see the names of sites and the variety of audiences they target — from overt neonazi sites to Natural Health’ hippy Leftist sites, and loads of conspiracy theories hobbyist sites (why). The range of sites with such shared third sources and shared text is not plausibly explainable by chance or by natural affinities. You might feel like washing your eyes out with soap afterwards, and almost all of us have been manipulated this way (including me), but it’s better to know.
  3. if you get weird looking stuff, copy-paste a chunk of text into google and see where else it’s shared. this is a crude but very easy form of plagiarism test. I’ve now found another free plagiarism checker site which works even better than google: You’ll find many more connected sites by plagiarism analysis than by referrals network analysis; it’s much more sensitive, but higher false positive rate.
  4. Referrals network test —

Similarweb is a Search Engine Optimisation (SEO) tool, designed for website marketers, but you can also use bits of it for OSINT. You don’t need to register and the free stuff is enough (altho a Pro account would be fun to try, please), just type or copy-paste the site name in and click. The most relevant section for us is Referrals — that means sites which refer into the site you’ve searched on and sites which are referred to by that site.

Alexa is like Similarweb, better in some ways but without a free trial version-

Initially an easy starting point for comparison to notice covert disinfo sites in the Referrals analysis results of the site you’ve searched on is:


SPLC used Alexa search engine to gather the data represented in the graph above, which is like Similarweb but without a free trial version.

This is just an example, showing the network of referrers to and from —

The real scale of the Russian disinformation grey sites network is vast. Guessing by number found / search effort, I reckon there are probably tens of thousands of covertly associated sites in their network globally.

This list of Kremlin associated sites was built using basically the same methods as explained in this guide—

(I disagree with the removal of sites from the list just because the site owners or editors deny their association. They may indeed not have been consciously intentionally connected, some of them, but consequences matter more than intentions in this context, and they were certainly closely associated.)

You’ll get much quicker at interpreting referrals network analysis results with practice when you’ve learned the names of more covert disinfo sites.

If search on Similarweb for a site you suspect and find it doesn’t have any obviously dodgy results in the referrals section, click through to a few of the site names you don’t recognise and see their referrals network, often it’s just one or two transitive steps away til you hit an obvious one like Sputnik or Russia Today. Sites don’t have to have referrals, Similarweb free version will only show you the top 5, and I think they might be adjusting to this method we’ve been using to reveal their covert connections and deleting their hyperlinks, so not having other covert disinfo sites in the referrals network of a site you’re investigating doesn’t necessarily mean it’s unconnected.

Referrals network analysis is an insensitive but reliable method — you’ll get a lot of false negatives but almost certainly no false positives.

A couple of really basic and important points I keep having to re-emphasise:

  1. All the information we receive on the internet actually comes to us selected/ prioritised algorithmically by our previous interactions on the network. It’s presented to us as if the posts are individual, separate and you can choose one and not have an effect on promoting all the rest of that source’s posts for a long time after, but that’s not actually how Facebook works — see Social Graph API overview. Facebook and Google are not WYSIWYG (why you see is what you get) designs.

When you interact with or share a particular post, what you are effectively doing is promoting its source and associated sources through the network long-term. The content of the particular post, even if it’s genuinely innocuous, either harmless funny clickbait or a bit of genuine reporting, perhaps even without any strategic framing applied, is trivial compared to the effect of promoting its source. If you don’t know what else a source does and is associated with, don’t make an uninformed choice to promote it.

That’s a problem with the design of Facebook — and in a different way Google . On Google, it’s it’s not just our interactions but others’ too influence search rankings, and including bots (automated, robotic fake accounts), which are used to manipulate Google’s search results ranking (see below).

The internet will continue to be a confusing information-psychological warzone until the networked-ness of information is made visible so that people can easily and instantly see where stuff’s coming from and who/ what it’s associated with and what effects their interacting with it may have.

2) Most State- or regime-backed disinformation sites are covertly connected — the overt ones like Russia Today are a tiny part of their overall network. There are probably tens of thousands of covertly connected disinfo sites.

You absolutely don’t have to believe me on this and I’d prefer you don’t believe me but just learn the basic techniques to see their connections for yourself. The simple techniques explained above won’t catch all of the covert disinfo sites, but most of them are not really well disguised at all.

Often when I try to point out covert disinfo sites are covertly connected I get shouted at for being patronising. So I would really prefer if people learnt how to do it for themselves and stop relying on me to do it then blaming me for pointing out when they’ve chosen to promote a source which they probably wouldn’t have done if they’d known more about it and its network.

3) We really do have to judge sites by association — for several reasons:

Reframing is more effective emotionally and politically than debunking —people don’t get motivated about factual debunking even if on a superficial ‘rational’ cognitive level they accept it — the moral-emotional framing sticks longer than the ‘surface level’ information content (because human cognition is actually embodied not rational in the Cartesian sense, see Lakoff and Johnson, 1999). People tend to carry on fitting new information content into a misrepresentative frame used to mislead and manipulate their political decision making even after they’ve been shown that it is misrepresentative. Human rationality is not as individual as its been traditionality theorised to be — we mostly make decisions based on social cues, which for humans are mainly linguistic moral-emotional framing. Moral emotions occur in our whole body, so are more memorable than particular information content.

Generating bullshit is so much quicker and easier than debunking it. If we try to debunk each article, each repetition of that content, bit by bit, page by page, site by site, there is absolutely no chance we can match the efficiency of the barrage of bullshit or ‘firehose of falsehoods’ strategy. This is reason three of three why the piecemeal debunking strategy cannot work.

The only strategy which can match or outrun them in efficiency is to judge sites by association and look at the networks behind them — ultimately, we need fundamental changes in major internet media so that the networked-ness of information is instantly easily visible to everyone, but in the meantime, these basic simple techniques will make the network visible.

If the social networked-ness of information sources was represented graphically in an intuitively instantly understandable way for human beings, then we could and probably would naturally use that spare attention and time to independently scrutinise the content, framing and themes more thoroughly. There would be more public attention capacity freed up so people could spend it on more in-depth reading and interpretation of news.

Currently, the social graph APIs of major social media companies are commercially private and what’s represented to us misleadingly is an atomised linear sequence of posts. That must change — I have no prejudice against private profit, but the public externalities of Google and Facebook’s current business model and algorithms are just too high to tolerate. They can change for the better and still be profitable too, perhaps even more so.

Yes there’s a risk of false positives and occasionally being unfair to individuals and to new media sites, but the public risks of not preferring to err on the cautiously suspicious side are far higher — e.g. when disinformation leads to mass murder, whereas very little real harm is done by mistakenly identifying a new or alternative media site as a covert disinfo site — if they object and show why, then the charge is dropped.

Social bees and ants do not wait to find out whether larvae which smell of oleic acid are really infected with parasitic Varroa destructor mites before they remove them from the nest, they remove them before the mite eggs hatch and infect others.

‘What about’ if an innocent bee larva not infected with Varroa mite eggs just happens to have chosen an eau de cologne with oleic acid in? — too late buddy bee, you’re out already.

Most “alternative” media sites now are demonstrably Kremlin-allied covert disinformation sites — you don’t have to believe or rely on me for this, use the techniques above to see for yourself their connections and the network.

A few strategic narrative themes which occur so frequently in Kremlin disinfo sites you can use them as indicators:

“MSM!” — functionally equivalent to Goebbels’ tactic of “Lügenpresse!”, it works to isolate their followers from the rest of the public and make it easier to manipulate people through groupshift and deindividuation processes.

“But WW3 with Russia!” — Putin is obsessed with hyping up the perception of external threats to Russia, because that’s one of the only things he’s seen as good against by his own people. To international audiences, it maintains public support for foreign policies of limitless appeasement regarding Russia’s increasingly aggressive global hybrid warfare strategies now.

Smearing the rescuers — the Syrian Civil Defence ‘White Helmets’ and any medics working in opposition territory in Syria make Russian actions look bad, so they must be discredited. This is such a major obsession for Kremlin-allied disinfo sites that you can use it to identify them. MSF Sea also get incredible amounts of harassment on Twitter, much of it led by Kremlin bots.

Mirror Accusation tactic — a defensive propaganda technique which works by pre-emptively accusing the other side of what you just did, so that when they accuse you of it it will seem less credible, so they may not even try. For example, the Assad regime really did enable and cooperate with Al-Qaeda in Iraq and really did setup and still supports Daesh (ISIS) in Syria, so to defend against those true accusations they use the big bold lie propaganda tactic to falsely accuse all their international enemies of doing what they actually did. If you see “US created ISIS” bullshit (why), that site is either directly connected to them or a useful idiot for them. For tyrants globally now, the most convenient international propaganda line is to call all their political opponents “terrorists” or “terrorist supporters”.

For more advanced OSINT tools and training, see:

Articles showing how much you can investigate and see for yourself using only open source data and public tools:

On google search ranking manipulation using SEO techniques plus bot-nets:

Longer reflective piece about the ethical implications and consequences —