Microbial Knowledge Graph with BioCypher and SemSpect

Construct and analyze a Biolink-compatible graph

Sixing Huang
Geek Culture

--

This article has been updated to reflect the newest changes in BioCypher 0.5.35.

Photo by DeepMind on Unsplash

A knowledge graph (KG) is a type of database that stores and organizes knowledge in a graph-like structure, where nodes represent entities and edges represent relationships between them. It is designed to capture complex relationships and dependencies among different pieces of information, enabling more effective search, retrieval, and analysis of data. Knowledge graphs are commonly used in biology and healthcare.

OpenAI’s GPT-3 and ChatGPT have made the construction and query of knowledge graphs a lot easier. Now, every biologist can build his or her own knowledge graphs. It is no surprise that biological knowledge graphs are popping up like mushrooms in Medium, LinkedIn, and GitHub (1, 2, 3, 4, 5, 6, 7, and 8). But there are two problems.

Firstly, many of these private knowledge graphs used non-standardized vocabulary. For example, a taxon can be called Taxon in one knowledge graph and NCBITaxon in another. As a result, it is difficult to merge knowledge graphs by different creators, and sometimes by the same creator on different projects. So bioinformaticians usually create their graphs from scratch and often…

--

--

Sixing Huang
Geek Culture

A Neo4j Ninja, German bioinformatician in Gemini Data. I like to try things: Cloud, ML, satellite imagery, Japanese, plants, and travel the world.