Building a CricGraph

Sai Praveen
Everything is Connected
4 min readMay 29, 2023

Welcome to the third and final part of the “Everything Thing is Connected” series. Wondering what is a CricGraph? Well it is a knowledge Graph built on the data of cricket players. Knowledge graphs have become an essential tool in data analytics for storing and organizing complex datasets. Many tools can be used to create knowledge graphs, which we have used in this series first and second parts, i.e., wiki data for Wikipedia articles and graphGPT for news articles and everyday language. But these tools may need to provide more flexibility and personalized databases. Therefore in this final part of the series, we will create a knowledge graph from scratch using custom databases and graph algorithms. This skill is valuable for anyone, whether you’re a data scientist, software engineer, or just exploring new technology.

Ready to break free from the limitations of pre-built tools for creating knowledge graphs? Discover the power of building a custom knowledge graph from scratch using custom databases.

If you need more fundamental learning of knowledge graphs, like what they are or how to make one using wiki data for Wikipedia articles and graphGPT for news articles or everyday language, here are the links to the first and second parts of the series.

Procedure to Making a Knowledge Graph

There are several steps to make a knowledge Graph from scratch. This blog will use the ESPN Cricket players dataset to illustrate these steps.

Step 1 — Collecting Dataset:

The first step in building a knowledge graph from scratch is to find the relevant dataset.

These datasets can be acquired through web scraping, crowdsourcing, available online datasets, APIs, Etc. The dataset used for building this knowledge graph was taken from here.

Step 2 — Preprocessing Dataset:

Once the dataset is found, it needs to be preprocessed. For this, we use pandas and numpy. This step includes removing unwanted columns, dealing with NAN values, and normalizing the data to be processed faster.

To get an idea about the dataset we have shown a snapshot below

Step 3 — Making Ontology using Protege:

An ontology formally represents a specific domain’s concepts, entities, and relationships. In the case of a cricket knowledge graph, the ontology may include concepts like players, teams, countries, Etc.

Now, this ontology can be created using software like Protege, and then the dataset can be mapped to that ontology/schema.

Image source — https://thesemanticway.wordpress.com/2008/11/11/owl-ontology-example/

The above image is an example of what an ontology looks like.
You can use tools like Protege to make your ontologies and visualize them. Here Protege is used for building an ontology of the ESPN cricket dataset.

Step 4 — Uploading Ontology on a Graph Database:

After creating the ontology, the next step is to upload it to a graph database; we will use ArangoDB. The measures include:

  • extracting important entities and edges from the dataset/data frame
  • Save them as separate .json files on your device (save the data frame in the “records” orientation).

Cricketer.to_json(‘Cricketer.json’, orient=’records’)

Country.to_json(‘Country.json’, orient=’records’)

  • Now that we have separate .json files for the entities and edges, we can upload these as different collections on ArangoDB.

Step 5 — Graph Generation:

Once the data is uploaded to the ArangoDB graph database, we can visualize it by clicking on the GRAPHS option in the left grey window. You can name your graph and select what relations you want to see in the graph. We can then navigate our way through the created graph to extract information.

The knowledge graph formed by following the above steps looks like this:

Conclusion

To summarize, you can create your custom knowledge graphs using Protege and graph databases which will help organize and analyze data in a structured and meaningful way. Follow the steps mentioned in this blog. You can find datasets from different domains, clean and preprocess them, create an ontology and upload it to a graph database to make a knowledge graph that can be used for various applications. In this data-driven age, data is more readily available than ever before. With knowledge graphs, this huge amount of data can be understood and put to good use, saving time and lives.

Authors — Reva Bharara, Aryan Rathore

--

--