[Snowflake Quick Tips] How to use Snowflake with Neo4j

The “Snowflake Quick Tips” series presents short practical use cases you should be able to complete in 10 minutes or so.

Snowflake is a cloud data warehouse using relational models. Neo4j is a NoSQL database platform well-suited for graph models. This quick runbook will quickly teach you how to visualize nodes with relationships in both Snowflake and Neo4j.

Whether or not you already know Snowflake or Neo4j, you should be able at the end of this lecture to have a better understanding of Snowflake’s relational models and Neo4j’s graph models.

Create a Relational Database — in Snowflake

You must already have access to a test sandbox Snowflake account. You could eventually sign-up for a free trial account (only an email address is required).

I saved all related files of a new Movies database in this public GitHub repository. Copy and paste this SQL DDL file into a new worksheet and run the script. It will create a similar database to the one used in Neo4j, but with a relational model.

Then upload each of the CSV files from this folder into the related tables, starting with PERSON and MOVIE, followed by the rest. Use a custom file format that skips the first line (with column headers) and has the fields optionally enclosed by double quotes:

Create a similar Graph Database — in Neo4j

You can create a free cloud Neo4j Aura account. No credit card is required, you get one active free instance forever, and you can use a lot of large enough databases in it! Create a new instance using the Movie demo database.

For the relational mode, I created all those CSV files and the SQL create database script from an export on the Neo4j Aura, one single big CSV file created by calling these statements:

CALL apoc.export.csv.all(null, {stream:true})
YIELD data
RETURN data

The Database Model — in Snowflake

If you import the Snowflake data model for this “movies” database into a trial version of my Data Xtractor, you may get a diagram like below. Remark the different icons used for the independent entities (Person and Movie) and the intersection tables:

With a few other adjustments that you will find only in Data Xtractor — like hiding the intersection tables from a database diagram — you can better outline just the relationship types between the two types of main objects (Person and Movie):

The Database Model — in Neo4j

In Neo4j, it’s simple: you have to call db.schema.visualization() to get the graph data model. We have two types of nodes: Person and Movie. And relationships between them. A Person can direct, produce, write, act in, or review a Movie. And a Person can also follow one or more other Persons:

Who is Related to Some Movie? — in Snowflake

You can query individual nodes and relationships in both a relational database (in SQL) or a graph database like Neo4j (in Cypher).

If you want to show all people related to the movie “Cloud Atlas” in any way, the SQL query in Snowflake can be a bit of a challenge. Let’s run this as well in Data Xtractor, as a manual UNION query:

with m as (
select id_movie
from movie
where title = 'Cloud Atlas')

select name, 'wrote' as reltype
from person p
join wrote r on p.id_person = r.id_person
join m on r.id_movie = m.id_movie

union
select name, 'acted_in' as reltype
from person p
join acted_in r on p.id_person = r.id_person
join m on r.id_movie = m.id_movie

union
select name, 'directed' as reltype
from person p
join directed r on p.id_person = r.id_person
join m on r.id_movie = m.id_movie

union
select name, 'produced' as reltype
from person p
join produced r on p.id_person = r.id_person
join m on r.id_movie = m.id_movie

union
select name, 'reviewed' as reltype
from person p
join reviewed r on p.id_person = r.id_person
join m on r.id_movie = m.id_movie

Who is Related to Some Movie? — in Neo4j

Neo4j’s Cypher can come up with a less verbose and more flexible query. And the visual representation could be more self-explanatory for just a few objects:

MATCH (p:Person)
-[d:WROTE|ACTED_IN|DIRECTED|PRODUCED|REVIEWED]
-(m:Movie {title: "Cloud Atlas"})
RETURN p,d,m

Who Follows Who? — in Snowflake

The SQL query could output pairs of followers, but you do not get a more suggestive picture, you get just a tabular display. You may “design” a SQL query in an interactive manner in Data Xtractor, without knowledge of SQL:

The generated SQL query in Data Xtractor was:

select p.name as follower,
p1.name as followed
from public.person as p
inner join public.follows as f
on p.id_person = id_followed
inner join public.person as p1
on f.id_person = p1.id_person

Who Follows Who? — in Neo4j

The following equivalent Cypher query shows all persons who follow other persons. I also manually selected and expanded all other relationships with James. Graph data visualization is even more powerful for reflexive relationships when you show on screen just a few objects:

MATCH (p:Person)-[d:FOLLOWS]-(f:Person) RETURN p,d,f

Neo4j Professional Certifications

Learn Neo4j with Cypher and become a Neo4j Certified Professional or Neo4j Graph Data Science Certified by passing some free online exams!

You try until you pass, but you may be limited to just one trial daily if you fail. Use my popular practice tests on Udemy to avoid all the trouble…

Final Thoughts

Databases with a lot of relationships could be better suited for NoSQL graph database engines.

You can basically export data in CSV files from a relational database like Snowflake, and import them into a graph database like Neo4j. Several rules apply, in the sense that nodes must have unique IDs, and relationships should have start and end node IDs.

For just a few objects, or when you traverse several relationships, the syntax of Cypher could be more intuitive than SQL, and the visualization of objects could be more effective than the tabular text display.

--

--

Cristian Scutaru
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

World-class expert in Snowflake Data Cloud. Former Snowflake "Data Superhero". SnowPro SME (Subject Matter Expert). 5x SnowPro certification exams.