Learn geography using Neo4j

Jimmy Crequer
Neo4j Developer Blog
6 min readNov 30, 2019

In a previous story I implemented a small app to learn Japanese characters using Neo4j. Lately, I spent time trying to remember all countries in the world (I guess I have too much free time…), and I figured out I could use a graph to help me in this journey too.

In this post, I will build a small graph of European countries and write a short CLI application to interact with the graph and help learning those countries.

The graph

Build the graph

I created a countries.json file from Wikipedia’s data, containing all European countries. The file available here is structured as follows :

[
{
"name": "France",
"population": 67348000,
"area": 643427,
"capital": "Paris",
"neighbors": ["Andorra", "Belgium", ..., "Switzerland"]
},
...
]

We will create the following entities :

  • Country nodes, with a name, a population and an area
  • City nodes, with a name
  • Relationships between Country and City nodes to represent the capitals
  • Relationships between 2 Country nodes when they have a common border

Neo4j’s APOC library provides a very convenient way to import JSON files. We only need a few Cypher lines to build our graph :

WITH "https://gist.githubusercontent.com/jimmycrequer/7aa867900d0cf0b9588d4354f09cb286/raw/countries.json" AS url
CALL apoc.load.json(url) YIELD value AS v
MERGE (c:Country {name: v.name})
SET c.population = v.population, c.area = v.area
CREATE (capital:City {name: v.capital})
CREATE (c)<-[:IS_CAPITAL_OF]-(capital)
FOREACH (n IN v.neighbors |
MERGE (neighbor:Country {name: n})
MERGE (c)-[:IS_NEIGHBOR_OF]-(neighbor)
)
RETURN *
Our graph

Explore the graph

Let’s start with the top 10 biggest countries in Europe.

MATCH (c:Country)
RETURN c.name AS country, apoc.number.format(c.area) AS area
ORDER BY c.area DESC
LIMIT 10

Note : “apoc.number.format()” returns a String, and to get the correct sorting we need to “ORDER BY” the numerical value.

To be honest, I would have thought that Ukraine was bigger than France. Moreover, it seems my data counted Greenland as well which explains why Denmark appears in the top 3.

https://en.wikipedia.org/wiki/Denmark

We can also calculate the density of population for each country.

MATCH (c:Country)
RETURN c.name AS name,
apoc.number.format(c.area) AS area,
apoc.number.format(c.population) AS population,
c.population / c.area AS density
ORDER BY density ASC

It is interesting to note the presence of Scandinavia and especially the Baltic states here, despite being relatively small states.

Let’s now have a look at the relationships between countries.

MATCH (c:Country)-[:IS_NEIGHBOR_OF]-(c2:Country)
WITH c, collect(c2.name) AS neighbors
RETURN c.name, neighbors
ORDER BY size(neighbors) DESC

No really big surprise here. Germany and France have a lot of common borders with small countries (Belgium, Switzerland, Luxembourg) and are located are the center of Europe. Notice that this dataset doesn’t include Asian countries, so Russia and other countries like Kazakhstan do have more bordering countries.

You can also render a “map” of Europe just using the neighborhood relationships and the force-layout.

MATCH (c1:Country)-[nb:IS_NEIGHBOR_OF]-(c2:Country)
RETURN c1,nb,c2
“Map” of Europe

Lastly, we also make use of Neo4j’s “shortestPath()” function to know how many countries need to be crossed to reach 2 specified countries. Example here with France and Greece.

MATCH (france:Country {name: "France"}), 
(greece:Country {name: "Greece"}),
p = shortestPath((france)-[*]-(greece))
RETURN p

Now that the graph is ready, let’s create a small CLI app to play with it!

Build the app

I decided to create a small CLI app with Node.js since Neo4j provides a driver for JavaScript.

Main function

Let’s start with the main function.

First, I connect to the Neo4j instance and create a new session. Then I create the main loop of the application, which redirects the user to which game they choose to play to.

Let’s dive into other functions.

GuessCountryFromCapital function

The code is pretty straightforward. I use the following Cypher query to return a pair of Country and City at random.

MATCH p = (:Country)<-[:IS_CAPITAL_OF]-(:City)
RETURN apoc.coll.randomItem(collect(p)) AS p

Then the user is prompted the question. Finally we display a message whether his answer was correct or wrong, using simple text coloration.

console.log('\x1b[32m%s\x1b[0m', 'Correct!')

This line will print “Correct!” in green.

console.log('\x1b[33m%s\x1b[0m', `Wrong! The answer is ${countryName}.`)

This line will print the message in yellow.

GuessCountryFromNeighbors function

Similarly to the previous function, I implemented another function which will ask, for a list of countries, the country which has borders with all of them.

This time, the Cypher query will fetch all the countries and their neighbors, then I randomly pick one from the “records” property in JavaScript.

And that’s it. Run the app and you can start playing!

Sample execution

Conclusion

In this post, I was able to build a small CLI application using Neo4j as a database to help learning European countries.

I deeply think graph databases are useful for learning because we tend to remember more easily new knowledge by forming associations with what we already know. The very efficient method of loci, which is about remembering an ordered list of things by visualizing them to familiar locations, demonstrate this. You can associate each item of your list to :

  • A room of your house
  • A shop in your preferred street
  • A street in your childhood city
  • A station of your commutation train

Every place that is familiar to you will help you remember any thing. It’s all about connecting things together!

To remember where Albania is, I could learn that its coordinates are 41.1533° N, 20.1683° E, but it would be way more efficient and easy to remember if I just learn that it is the most left country on the North of Greece. Of course, to make it work I need to know where Greece is, but once we get a solid common knowledge, it is really easy to connect additional new things to it!

While this post is still trivial, by taking advantage of Neo4j’s nature it is really easy to add more nodes from additional datasources to densify this graph and extend the learning potential. In a next post, I will try to add some additional datasources and provide new questions like :

  • Seas : “Which countries have a border with the Mediterranean Sea?”, “Which European countries have border with no seas?”, …
  • Mountains : “Which countries are the Alps in?”, …
  • Rivers : “Which rivers are traversing through France?”, …

Happy learning!

--

--