What I learned in a Week using TigerGraph

Naman Chawhan
Pinboard Consulting
5 min readMar 29, 2022

Starting my position as a technical consultant in Pinboard Consulting, I was assigned a task to migrate data from a MS-SQL database to TigerGraph. Having little to no experience with graph database, I had to put in a lot of hours to understand the schematics of graph databases and how they work. This blog just summarises some of the key points I’ve picked up.

Day 1 and 2

After the briefing about the task, I quickly got to work and did what every developer does: Google things. I googled TigerGraph and tried to look for resources to help me understand it. Since TigerGraph is new I couldn’t find a lot of material on it and the next best thing was documentation. I skimmed through the documentation. I read the ‘Getting started bit’ and decided to install TigerGraph Server locally. The installation instructions were straight forward and the initial bit worked fine. I found the locally hosted User Interface is pretty interesting and easy to use. When I restarted my local instance, I decided to look into GSQL which is the query language for TigerGraph, but my instance started throwing errors on the same commands which ran perfectly before. I decided to uninstall and install it again. Same, it worked smooth and the GSQL shell worked fine and again after a restart it gave an error while trying to access the GSQL shell. Scratching my head over this for a while I decided to explore GSQL instead.

The next day was mostly spent on learning GSQL and understanding loading jobs. The thing I liked the most about GSQL is that it is similar to SQL and since I knew SQL it was easier for me to pick up. One thing I did not understand is the ‘reversed edge’. I know why it’s there but I failed to understand why is it implemented in a certain way. I suspect that this is something the devs didn’t plan on during the earlier phases of the development. Back to trying to make GSQL shell work, I came across the services section and learned that you need to restart them. I feel that this is something that should have been mentioned in the installation instructions. Having a critical piece of information in another section of documentation breaks the continuity of learning.

Day 3

Day 3 was spent of trying to figure out the connectivity of TigerGraph. Primarily I was looking for something that had data migration capabilities. Since TigerGraph already has support for migration from MySQL and PostgreSQL and not for Microsoft-SQL, I felt it was slightly unfair towards Microsoft. Bad Tiger! To compensate for it I had a plethora of options to choose from as TigerGraph has multiple options for connectivity. Kafka and Spark were some of the new technologies for me while I had some experience with API’s. I could have also loaded the data from a text file and could have been done in 2 to 3 days but it didn’t look scalable. Since I was already connected to MS-SQL through a JDBC driver with spark I felt it would make sense if I do the same on the TigerGraph’s side as well. A driver for TigerGraph driver wasn’t very difficult to find as it was available on their GitHub page.

Day 4

I’d say this was the day I learned the most when it comes to Graph and Graph Schema. I was used to designing schema in a tabular form in SQL and the way you design it in a Graph database is very different. Everything isn’t a table but rather a node or an edge with their directions and attributes. Figuring and working out a couple of schemas I finally landed on something that contained data in an efficient way. The concept of avoiding a super-node was interesting to me, it’s a node with a lot of edges connected to it.

Day 5

This day I was mostly working on the code. The basic plan was to use spark and respective JDBC connectors to read from one end and write on the other. I could have used pyTigerGraph which is a python library specifically built for TigerGraph but I wanted to keep minimum dependencies. The most notorious library to install was py4j after multiple attempts and fiddling around with environment path variables for a long time I finally got it to work and a tear of happiness rolled down my eye. The coding bit was straight forward, read every row from every table and insert the data in their respective node, edge and their attributes. The code took quite time to run but finally it was complete, after checking a couple of counts to make sure the load reflects the data correctly, I knew I could shut my laptop for the week.

Day 6 and 7

Just kidding, I didn’t shut my laptop. I was way to curious to know how queries would work on a large dataset and this was my time to write some bigger queries. So I tried to come up with some use cases and tried writing some reporting queries. What impressed me the most is the sheer speed in which these queries return data. I think graph databases are extremely quick with join operations because the relations are already defined while loading data and not while traversing. I was really happy with what I had achieved so far.

The rest of the weekend was spent reading about graph databases and their future applications. A lot of organisations are adopting graph majorly because of their ability to handle highly connected data and I feel maybe in the future graph databases could replace the legacy RDBMS databases as the centre piece of the architecture. Regardless, I had fun learning everything I had learnt and I’m far from knowing everything. I really wished to explore integration of TigerGraph using API’s but you never know I might get an opportunity soon.

Takeaways from the Week:

1. Documentation could have been better. It needs a slight improvement and adding more examples won’t hurt anyone.

2. Compatibility of TigerGraph with a lot of technologies like Kafka, Spark, Amazon S3 and many more make it really easy to use and it becomes very easy for developers and data engineers to work with it.

3. Graph Schema design is quite tricky if you’re new to it. But not something that would take long to master.

4. Graph databases are really quick compared to traditional RDBMS databases in terms of performing join operations. Traversal is extremely quick in Graph databases.

5. It is a highly compatible and it is rapidly being adopted by many organisations. Graph could be the future.

--

--

Naman Chawhan
Pinboard Consulting

Programmer with an inclination towards Graph Database and Data Engineering.