What skills do I need to get started as a digital investigator?

4 min readDec 1, 2022

Given the rise of open source research and interest in Trust and Safety, I’ve been wanting to get some quick thoughts down about the qualities and skills I’ve found useful as a digital investigator. My perspective is shaped by my experiences working as an independent OSINT analyst; an investigator in two UN Commissions of Inquiry, which are bodies setup to investigate some of the most grave human rights and humanitarian law violations around the world; and as a senior investigator — and later as a manager — in a team at Twitter tasked with fighting platform manipulation and spam.

Obviously, these observation are tailored to specific kinds of work, and they’re not the only ones out there. But here are the qualities that I’ve consistently seen in some of my best colleagues and those that I’ve tried to cultivate throughout my own career:

Analytical thinking

Analytic thinking helps investigators cut through uncertainty and cognitive bias to strengthen the confidence of their conclusions. Most investigations require you to collect data, analyze it, evaluate it, and generate an assessment or some insight. Analysis and evaluation are an essential part of this process and analytic thinking means testing your assumptions and ensuring your findings stand up to scrutiny.

A good analyst observes a suspicious pattern and asks questions like, “Why is this the case?” or “Is this a pattern coming from some error in the data or my own analysis?” and “What else could make this pattern emerge?” Such questions generate competing hypotheses, scrutinize the quality of the data, and search for counterfactuals.

An investigative mindset

This is probably the most important quality. But what does having an “investigative mindset” really mean?

In my view, it boils down to being relentlessly curious. Among others, it means asking a lot of “how” questions. When you see abusive or harmful behavior, you might extract technical signals left by the adversary, and then you ask questions like: How often do we see these signals being used (by both the adversary and non-malicious users)? How has the use of these signals changed over time? How did the adversary thwart our defenses? How badly did our defenses fair? How would we have found this otherwise? How can we use this knowledge to develop robust detection?

Knowledge of a scripting language

If you’re looking to work in Trust and Safety at most platforms, you’ll need to have knowledge of some scripting language, specifically one used for accessing large datasets. Ideally, you should be familiar with SQL and one other language (Python, R, etc.). Concretely, you should know how to use basic functions and joins in SQL, given you’ll often need to combine multiple, disparate tables for analysis.

In my experience, this is where analytical thinking, an investigative mindset, and a bit of technical knowledge can become invaluable. You create a hypothesis about abusive behavior, the signals you expect to see, and then write a new query (or more often, a great many queries) to see if your hypothesis holds up. This is also an area where I’ve seen novel query development lead to new methods for detection and disruption.

Another reason I suggest knowledge of a scripting language is because of the many applications. With a little Python or R knowledge, you can retrieve large data sets, conduct exploratory data analysis, and create data visualizations and reports to disseminate your findings to peers and cross-functional stakeholders. This kind of investigative workflow ensures consistency and reproducibility, so that other analysts can review (and iterate on) your queries and analysis, understanding how you reached your conclusions.

Resourcefulness

Data that you rely on for investigations can often be delayed or disrupted for any number of reasons. A good analyst needs to know how to overcome these challenges, especially when responding to high-risk incidents or events.

In practice, this means knowing where to find alternative data sources or how to use other signals as proxies for whatever you’re interested in understanding. Sometimes, secondary data sources can’t provide the same level of confidence as primary data sources, and that’s okay. An analyst should still be able to draw conclusions from those proxies and communicate their findings with a clear message about limitations.

Similarly, a resourceful analyst should be able to draw on existing products and generate new, additional insight. This might be as simple as modifying or adapting an existing query for a new case or synthesizing external intelligence to help decide on where to look next.

Wrapping up

Again, these are based on my working experiences, and they’ll no doubt vary depending on your specific expertise (for example, you may work with child sexual exploitation (CSE) content, in which case, a quality like “resilience” is essential). In another post, I plan on adding more concrete real-world (and toy) examples to illustrate what I mean by some of these qualities, as well as talk about some basic patterns that are useful for adversary detection.