Uncommon OSINT: Obsidian, Semantic Meaning and NLP
At its heart, Obsidian is a note taking app, but in its soul, it is a chimera.
by Claudia Tietze: founder & managing director of Farallon, LLC
Introduction
Open Source Intelligence (OSINT) is defined by its sources — legal, open, public and commercial. It is vibrant, adaptable and has no set rules as to what tools a practitioner “should use” to uncover intelligence, organize it and make sense of it. We have a wide variety of choices for built-for-purpose tools with a heavy focus on collection. Yet there are many overlooked tools designed for completely different purposes that can be insanely useful for OSINT investigations, organization and analysis.
OSINT practitioners can be seen as a type of knowledge worker — melding logic, analytics and art. We answer questions and solve puzzles, then communicate in a way people can understand so they can make more informed decisions or integrate our findings with other knowledge.
I’m a multi-disciplinarian, cross-matching knowledge from one field and applying it to another. I believe that the tools we use are an extension of how we think, augment our minds and influence our thinking processes.
Knowledge is power. Data is currency. How we leverage the two is changing. Those are the shifts I explore by challenging others to think differently about the ways in which our technology can help us be better intelligence analysts and communicators.
This is the first article in the Uncommon OSINT series that highlight this multi-disciplinary approach with technology in a way that anyone can understand and use. Focusing on free or low cost tools, each article will demonstrate a use case, using specific examples and highlighting how the tool can be used for OSINT investigations and analysis. At the end of each article, I’ll give you all the information you need to get started with your own use case.
The Uncommon OSINT series of articles was inspired by a conversation with Amber at Tactical Tech’s Influence Industry Project. You can find the project here: https://influenceindustry.org
Focus: Using Obsidian to draw out semantic meaning from data by adding context with inferred relationships to create new capabilities with Obsidian’s native features for use in Intelligence Analysis.
Why Obsidian?
We are on the cusp of changes in how we interact with data and technology — how we think, make connections and learn. These changes orient our technical solutions more naturally toward how our brains work. We are used to — and comfortable with — the tools we’ve grown accustomed to over decades. It takes a little while to explore new ways of doing things to fully uncover their capabilities. In performing arts there is a saying: It takes ten years to become an overnight success. The same is true for the evolution and adoption of technology. When our tools are ‘good enough’, we use them habitually instead of looking for innovative approaches that can surface fresh ways of uncovering insight.
This shift led me to explore Obsidian — a free-flowing note taking app structured toward ideas and concepts instead of chronology and folders. Last year I challenged myself to turn Obsidian into an OSINT tool that allows me to discover contextual connections that are hard to uncover without expensive sophisticated technological assistance.
I took a perfectly beautiful note taking tool, and then I broke it… and I couldn’t be happier with the results!
Since then, I have used Obsidian to anticipate unknown locations of Ukrainian children transported to Russia, map out corporate structures, track espionage, identify money laundering, analyze theoretical intelligence, and explore the mystery of the Baltic jammer. Obsidian is an OSINT multiplier.
USE CASE TOOLS
Obsidian: Open source, platform agnostic, and free for individual use with optional paid services that add convenience to some actions like sync or publishing. Low cost commercial use. Uses Markdown. Adaptable and scalable with an enthusiastic community and community plugins. There is excellent documentation and community support for Obsidian to help you learn. If you get stuck or want to bounce an idea off someone — feel free to reach out.
What makes Obsidian special? At its heart, Obsidian is a note taking app, but in its soul, it is a chimera. What you can imagine, you can create. Unshackled by traditional hierarchy structures, Obsidian focuses more on the significance and nature of relationships between one or more notes throughout all the bits of knowledge you’ve collected. Obsidian lives in a constant state of controlled entropy.
Why I chose it: The scant few write ups that exist about using Obsidian for OSINT are too limited. They miss the bigger powerhouse features that can elevate investigation and analysis.
Key Concepts and Definitions: While you need little to no technical expertise to put this use case to work, I’m going to explain some definitions and concepts so we can speak a common language. If you aren’t interested, you can skip down to the Onward! section.
- Digital Garden: Obsidian was developed as a digital garden, using linked ideas to reveal connections by reducing the friction of networked thinking. It was developed to keep short notes over time that evolve with us — as we learn, grow and change. Digital gardeners are hands-on; tending our gardens over time. Just like a real garden, a digital garden is planted, cultivated, curated, nurtured and interacted with season after season. And just like gardening, it’s messy, full of surprise and wonderment; and no matter how much we plan, a garden thrives in its own beautiful chaos, even as we hope that our patience and diligent care will pay off.
- Knowledge graph: The connections in Obsidian can be used to create a knowledge graph built on structured representations of information that capture relationships between entities with an emphasis on connected data and semantic meaning. This approach is shown to enhance decision-making and strategic planning by providing a comprehensive view of information and relationships. And it’s the approach we’ll be taking in our use case.
- Semantics: The study of the meaning of words and phrases in language. The primary goal of semantics is to give context to data. By doing so, data transforms into actionable knowledge. Semantic data empowers us to move beyond mere numbers and embrace the deeper context that drives informed decisions and innovation. We are going to add semantic data for our use case with Wikidata.
- Ontology: Standard definition for knowledge; defining relationships between terms and concepts. They create a shared understanding of domain-specific knowledge. Semantics are a large part of ontologies by providing relationship information.
- Triples: Structured statements about information which are used in semantics. Using subject-predicate-object construction, relationships between and among entities are explained and defined by the properties that describe them. Triples are fundamental to encoding information in a semantic-based knowledge graph.
- Semantically Inferred Relationships: Using Transitive Reasoning semantics helps a machine make inferred connections using: If A is related to B and B is related to C, then A is indirectly related to C.
- Vault: A vault in Obsidian is a collection of notes and data. You can have multiple vaults. In OSINT terms, you can create a vault for each case.
- Note: Each entry in your vault is a note. A note is like a page in a notepad that may or may not embed other file types in it. Each note has a title and a body and can have front matter (YAML), which adds more information about that note’s contents and nature.
- YAML/front matter: For our purposes, YAML, or front matter, serves as a way of describing our note. Think of it like a driver’s license. Very quickly we can relay who we are, that we are who we say we are, and certain biographical information about us. For our use case, YAML is where we will add information that explains our note’s place in the world by incorporating Wikidata. Our use case YAML could be considered metadata on steroids.
- Backlinks: Incoming links from other notes.
- Outgoing links: Links going to other notes.
Skill level: Medium.
If you are beginner, getting used to Markdown and YAML might take a little while. Once you get it, it will be a snap! Everyone’s preferred organizational and formatting style will be different.
Security level: Customizable from highly secure to loosey-goosey
- Cloud or desktop
- Encryption options available
- Community plugins are open source with the code reviewable on GitHub for security concerns.
- Some plugins call third-party services where your data may be exposed. In some cases, there are alternative options that live on your machine and don’t expose your data.
- Restricted Mode turns on/off plugins for troubleshooting conflicts or keeping your vaults secure and sequestered. For our use case, once we import the semantic information, you can run some of the analysis in Restricted Mode using native Obsidian features. If you decided the plugins are too exposed for a secure case, you should consider populating the Wikidata terms first, locking down your vault and then adding case file data.
USE CASE: Intelligence Playground Vault
Applying semantics is a method I use to augment existing data. This use case solely focuses on how you can augment your data with semantic understanding and demonstrates a few enhancements and visualizations based solely on this enrichment.
Our vault is called Intelligence Playground because I began by using intelligence terms and enhanced them with connections made from Wikidata that Obsidian surfaced. The goal is to show you how maximizing the use of YAML front matter with Wikidata can help make connections and reveal inferred relationships.
Onward!
Now that we have a common language, let’s take a look at our use case and how to set up our vault.
Laying the foundation — our data
- Add Wikidata entities to populate YAML
- Add Wikpedia articles to the body of some notes for text examples
- In cases where the Wikidata description was not specific to intelligence, add additional YAML by hand. (This is usually solved with topic-specific ontologies, and Wikidata is a generalist.)
Add Wikidata
Wikidata includes over 100 million data items in semantic web structured format, so it’s a great starting place. On the website, Wikidata breaks down information into various categories in multiple languages. This is what an entry in Wikidata looks like on the webpage:
To add Wikidata to the Intelligence Playground vault, we’ll use the Wikidata Importer plugin by Sam Rose.
We start by entering the term we want to import into the interface for the plugin. I chose “spy”. Then select the result that most fits what you want to include in your vault.
In the screenshot above you can see that “spy” has several options, including specific events, locations and pop culture references. It will sometimes include papers or news articles. One thing to keep in mind: Obsidian does not allow certain characters in titles, so data with a colon will fail to import. If you feel you need a particular source, you’ll need to find another method of importing it.
Once you have imported your term, it will populate entries into the YAML — adding layers of intelligence.
In the above screenshot you can see some indicators, but also international translations and even the spy emoji 🕵.
Adding Wikipedia articles
To add Wikipedia articles, we will use the Wikipedia plugin by Jonathan Miller. This will import the first section of a Wikipedia article into your note.
If this were not a limited use case, we might have other information in the note, such as statistics, records, filings, articles, highlights from reports or whatever you collected that you want to connect in your vault. We make those connections with bi-directional links — both incoming and outgoing.
These links let us see how our notes are connected to each other and reveal the information the other notes have about our topic. You’ll notice two sections — Linked mentions and Unlinked mentions. Unlinked mentions are just things we have not “officially” linked yet but exist in our vault. These require hand-review and approval. They will show the term in headings and URLs. You don’t want to link to a concept in a URL as it will break the link by adding Markdown [[double brackets]] around the term. Whether you link to headings is a matter of preference.
In this case, all the linked mentions are very short with no descriptive text. This is unusual and is only happening due to our limited scope. The linked mentions are connected to the YAML we imported from Wikidata. In other words, these links already exist in our vault and have been confirmed.
Usually your mentions — linked or unlinked — will contain text from the body of a note.
If you look at the unlinked mentions, you’ll notice a text passage, and the word “spy” highlighted. This gives us more context when we link the two notes using [[double brackets]] by hand, or selecting the “link” button available in the unlinked mentions pane. Once the body text is linked, it will show up in our note and also as a connection with text context in the other note. The links and context flows both ways between notes.
The Fun Stuff
Now that we have our vault set up and populated with terms about intelligence from Wikidata and a little bit of Wikipedia, what can we do with this stuff?
Analysis — Explore and communicate our data
- Network graphs
- Natural Language Processing
- Clustering and topic modeling alogrithms
- Charts and graphs
Since Obsidian is all about making connections, network graph capability is native to the platform. We’ve enhanced this with a few extra plugins.
Graph Link Types by natefrisch01 adds link type information between two nodes on your graph such as “related to” “product of”, etc. This plugin also adds color coded relationship lines and a legend.
Juggl by Emile van Krieken grew from the same backbone ICIJ uses to visualize their Off Shore Leaks project — Neo4j. This adds interactive stylability to your network graph.
Now let’s explore!
You can explore the graph of a single note one level deep. You can keep your note on one side or top/bottom orientation.
Alternatively, you can close all other panes and only view the network graph. Here we’ve selected only the most direct incoming and outgoing links.
We can include neighbor links as well. If you think of “spy” as you, and the direct connections as your closest inner circle of friends, neighbors are part of your friends’ close circle of people but only in a limited context (say, a party). They are connected to you through your besties whether you are close to them or not. However, you are aware of them and their relationship to your closest friends. In essence, neighbors expand your circle of connections but only within a limited context. They only interact with your besties in this specific setting.
Notice some of these connections have a description of the relationship on the line between two nodes. This is what Nate’s Graph Link Types plugin does for us. It also gives us the color-coding of the relationships, and you can see the legend of the meaning on the upper left.
If we want to expand even deeper, we can change the depth from one level to two levels with or without neighbors. I chose without neighbors for more simplicity. This is removes the limited context above (a party). The way to think of this is your BFF giving you their address book. You aren’t going to call these people and don’t interact directly, but you have your BFF in common, and your network expands. If we had added neighbors, that would give us additonal context and proximity to you (the spy).
The graph is interactive, so when we run our cursor over certain nodes, the connections light up for us. Here, we can see that a spy is connected to an intelligence agency, and through that connection, a spy is also connected to intelligence agent, covert operations and espionage.
We can also see the direct connections from each of those nodes, expanding our understanding.
When we want more control over our graph, that’s where Emile’s Juggl plugin comes in.
Within Juggl, you can assign icons, shapes and colors, assorted views and other customizations. Juggl enchances the capabilities of your graph view. When you select a node, the note in focus will change to the corresponding note, and the graph re-orients itself to that note.
If I want to explore espionage more fully, I can do this in several ways, including breaking away just a single connection.
What if I don’t want to go back to my note or keep it open to learn more about a node? Then I can call up any text that exists in a note by putting my cursor over those that indicate there is more information.
Drawing Meaning from Algorithms
When you launch your vault, it is indexed and analyzed by SkepticMystic’s NLP plugin (NLP = Natural Language Processing). This extracts entities and parts of speech, and breaks down the text in your vault into bite sized pieces using tokenization to help the machine understand it better.
After the NLP plugin does its thing, the Graph Analysis plugin by SkepticMystic and Emile allows us to explore a note with multiple different topic modeling and clustering algorithms. This helps us see different ways the topic of our note is connected to other concepts within our vault, or the intelligence in our whole vault, using a variety of algorithmic techniques.
In this example, we are viewing the results of the HITS algorithm. HITS stands for Hyperlink-Induced Topic Search. It was developed for ranking webpages based on whether a page is a hub for other information and how authoritative it is.
In this case, it’s showing us links in our vault to notes that serve the largest number of connections and have high-quality content. You can see the authority and hub scores on the far right after the terms. These results might be skewed in our use case because we did not add Wikipedia to every note available. It’s possible additional text in the body of a note affects the results.
The Jaccard algorithm shows us how much two notes have in common. You can see the quality score on the far right column after the terms.
In a real case, the Jaccard algorithm can be used to find companies in corporate structures that have not yet been identified as significant but have a high degree of similarity. Depending on the contents of your vault, it can be used as a clue to look deeper at those two corporations for common ownership or shared activities.
The Magic of the Canvas
Canvases in Obsidian are a workplace where you can create your own material like diagrams, cards or charts and graphs on an infinite canvas. You can brainstorm, make your own connections by hand, group cards, add color or images, create visuals for communicating ideas, and move things around in a free-flowing manner.
For our use case two plugins will create our starter canvas — Link Exploder by Ben Hughes and Semantic Canvas from Aaron Gillespie. Once we have the start from the plugins, we can add all sorts of information from our vault that we want to visually link.
OSINT TIP: This is great for generating part of a report in an actual case, gaining clarity or gaming out hypotheses.
Ben’s Link Exploder plugin shows the incoming and outgoing links to/from a note and the corresponding notes on a canvas with cards. From our spy note, we’ll select the command center and choose Link Exploder.
OSINT TIP: If you create a note called IP Addresses and other notes containing all IP addresses found in your case, linking those together using YAML or the body of the notes, you could create an instant canvas showing all IP addresses and their connections to each other. You could also do this with YAML by assigning “kind” or “type” or “IP Address” to notes.
Our starter canvas is automatically populated. Since it’s the index of our vault, our starter canvas included the file _Index_of_Intelligence_Playground. It isn’t significant, so we’ll remove it from the canvas by selecting the trashcan in the menu that pops up when we interact with the card.
Now we’ll select some colors to make our canvas stand out and help us visually organize it. I assigned yellow for hubs, blue/green for descriptors and purple for things that potentially have action or risk associated with them. I moved things around so I could get a better picture.
From the menu, either bottom center or right click from a specific card, we can add a card, a note from the vault, media from the vault, a website or create groups of cards. I added some pictures and a card with thoughts on it and dragged links to connect to other cards.
When we zoom in, we can see the information found in notes that have text in the body and examine the ideas in more detail with our Link Exploder spy canvas.
You’ll notice some of the cards are blank and don’t have a note description. That’s because there is no text in the body of those notes yet.
Aaron’s Semantic Canvas plugin isn’t interested in relational links in the same way. It focuses on the semantic structure of a concept.
And if you zoom in, you’ll notice some notes don’t have information. Those notes have a menu inside of them that we can interact with.
This shows us semantic connections from the YAML, whether or not the note has been created. If we want to create a note, we can choose create new note, swap the file or remove it. This can be a great way to build out concepts using the canvas. Aaron’s plugin also lets us append a note from our canvas, so any work we do here, we can add to the knowledge in the note itself.
If the Wikidata YAML has a link to an image, the Semantic Canvas plugin can add the image automatically. Below is an example. I did not add these images. They were pulled from the image hot links in the YAML provided by Wikidata.
Charts and Graphs
It’s often helpful to visualize a complex investigation in many different ways. I added another plugin to create some charts to help us do that. Charts View by Caronchen lets us play with plots and graphs either from our vault data or imported from a file. We can embed those charts into our notes.
OSINT TIP: Charts and graphs are helpful for report writing.
I chose two chart options — word cloud and foam tree.
But How Can I Use This For Actual OSINT Stuff?
Links between things can be anything. Technology is agnostic. Tech doesn’t care if you are exploring the Intelligence Playground, collecting recipes or looking at money laundering. It’s all about data points and connections between them, no matter what that data is.
The YAML categories from Wikidata include identifiers significant for an OSINT investigation. This covers everything from events and cyber to people and companies to countries and politics. These critical investigative identifiers are enhanced by any reports, documents, articles, text or images added to your case file for analysis.
Other Things to Keep in Mind for OSINT
- You can create your own YAML to suit your use case and preferences.
- Adding event dates and geodata can be used to create timelines (yes, Obsidian does this) and maps (yup, that too).
EXAMPLE: For a different use case tracking the recent slew of Russian espionage cases, I added my own YAML. I wanted to identify relationships outside of direct spy cells — such as a real life husband and wife Russian spy couple who work in different countries apart from one another and are not connected through their cover identities.
- Aliases are important. They help link all representations of an idea literally or subjectively, to other notes found in your vault. An alias that includes alternate spellings and acronyms helps us identify all variations we come across.
- Subjectively linking concepts using aliases can be helpful in some cases for simplifying many moving parts. For example, railway can have all terms associated with railways, trains, etc. Or all types of transportation can be aliased under one note.
- Foreign language cases also benefit from Wikidata since it is available in over 300 languages. Using your aliases for other language variations of names is powerful. Using this methodology, it is effective to run multi-lingual cases.
- Canvases can be a fast way to do a flow chart for corporate structure, brainstorming or to map out a ring of people. You can append those and embed them into notes or use images of them in reports.
Conclusion
As OSINT practitioners, we have a wide variety of tools at our disposal — so many that sometimes we forget to look around us and explore other tools built for other purposes.
The capabilities I showed you — the ones you can draw out of Obsidian — I’ve encountered in expensive custom big data builds and commercial products. Your cost for these added capabilities as an individual is your time, curiosity and creativity. If you are an enterprise, please make sure to buy a commercial license from Obsidian. These cost $50 per user per year. No fuss, no muss licensing. It helps support the continued creation of this amazing tool.
What You Need To Get Started
- Platform: Obsidian downloadable from Obsidian or GitHub
- THEME: The theme tested with this plugin combination is the LYT theme by Nick Milo. I can’t guarantee other templates will behave the same way. Themes can interfere with how your vault is displayed and cause conflicts with plugins. If you run into issues, revert to default.
Be sure to back up your vault. When a conflict arises, it’s possible to lose data.
Plugins: Some plugins are dependencies to make others work, and some are to keep your vault organized and neat.
Administrative Plugins
- Zoottelkeeper by Ankos Balasko: It maintains index files in all of your folders in your vault; if you create/delete/move a note, the index files will be updated automatically. It can be used to show folder in Graph View.
- Linter by Victor Tao: Format and style your notes. Linter can be used to format YAML tags, aliases, arrays and metadata; footnotes; headings; spacing; mathblocks; regular Markdown contents like lists, italics and bold…
Dependencies
- Dataview by Michael Brennan: Advanced queries over your vault for the data-obsessed.
- BRAT by TfTHacker: Easily install beta versions of a plugin for testing.
- NLP by SkepticMystic: In BRAT you will import SkepticMystic’s NLP plugin.
Enrichment
- Wikidata Importer by Sam Rose: Import data from Wikidata into your vault.
- Wikipedia by Jonathan Miller: Get the first section of Wikipedia for a note title or search term.
Analysis
- Charts View by Caronchen: Visualize data from your notes with plots and graphs. (You can import files as well.)
- Juggl by Emile van Krieken: Add a completely interactive, stylable and expandable graphview.
- Graph Analysis by SkepticMystic and Emile: Find hidden connections in your vault using cool graph algorithms.
- Link Exploder by Ben Hughes: Link Exploder creates a canvas from a note, embedding its incoming (i.e., backlinks) and outgoing links onto the canvas (as well as their linked notes).
- Graph Link Types by natefrisch01: Link types for graph view.
- Semantic Canvas by Aaron Gillespie: Create semantic knowledge graphs using canvases to modify note properties graphically.
About the Author
Claudia Tietze is the founder and managing director the boutique OSINT shop Farallon, LLC. Described as part spy, part librarian and part technologist, Claudia is a digital sniper. She leverages her experience in open source intelligence tradecraft and analysis to help individuals and organizations make better decisions. Working as a Creative Technologist and consultant on intelligence and geopolitical technology projects, she helps teams focus on enhanced understanding and human-centric design. Tietze is an engaging public speaker on topics such as OSINT, lateral thinking and creativity. She mentors and trains others in OSINT investigative techniques and tools.
REACH OUT
Have any feedback for this article? Have ideas about cool tools? Do you use a tool for a different purpose but think it might be great for Open Source Intelligence or Analysis? Do work or hobby in a different field, but think there is useful overlap? Feel free to reach out.