Create a Neo4j Procedure with Kotlin in 20 minutes

Published in

Neo4j Developer Blog

6 min readJun 23, 2018

Graph Database: Neo4j

If you already know about Neo4j and Graph databases you can skip ahead to the use-case for writing our procedures.

There is special type of NOSQL databases known as a Graph Store. These are powered by one of the old and strong data structure concepts in computer science, that is Graph Theory. Graph theory has seen a great usefulness and relevance in many problems across various domains. The most applied graph theoretic algorithms include various types of shortest path calculations, geodesic path, centrality measures like PageRank, Eigenvector centrality, closeness, betweenness, HITS, and many others.

There are many proven solutions for Graph Databases out there. Neo4j is a high-performance NOSQL graph database that is very easy to install and it is open source to use. It provides an ACID-Compliant transactional backend for your applications. The source code, written in Java and Scala, is available on GitHub. Neo4j provides you super connected data structures in graphs, that makes complicated data patterns easier to implement. Some of the key features are mentioned as the following (according to neo4j.com):

1. Neo4j’s First Mover Advantage is Connecting Everyone to Graphs
2. Biggest and Most Active Graph Community on the Planet
3. Highly Performance Read and Write Scalability, Without Compromise
4. High-Performance Thanks to Native Graph Storage & Processing
5. Rock-Solid Reliability for Mission-Critical Production Applications
6. Easier than Ever to Load Your Data into Neo4j
7. Whiteboard-friendly Data Modeling to Simplify the Development Cycle
8. Superb Value for Enterprise and Startup Projects

Our History with Neo4j

These features met our needs and we’ve started using Neo4j in several of our projects and educating team members in graph database since 2015, thanks to simplicity of graphs and Neo4j.

We have servers that are running on Neo4j 2.2, which were working fine up until now. After using Neo4j in real world projects before, this time we needed to use features inside of the graph engine for better performance and design. In SQL Server we used stored procedure and user-defined functions to explore a new level of data query in the database engine. So, what about Neo4j? Can we use inside procedure to explore some extraordinary functionality?

User Defined Procedures

Fortunately, Neo4j 3.0 introduced new a window to performance. You can write your own custom procedure or function easily in Neo4j just like we do in SQL Server. This feature is called “User Defined Procedures”.

Procedures are the preferred means for extending Neo4j.
Examples of use cases for procedures are:

1. Provide access to functionality that is not available in Cypher, such as manual indexes and schema introspection.

2. Provide access to third-party systems.

3. Perform graph-global operations, such as counting connected components or finding dense nodes.

4. Express a procedural operation that is difficult to express declaratively with Cypher.

In Neo4j, procedures are written in Java and packaged into .jar files. They can be deployed to the database by dropping a jar file into the $NEO4J_HOME/plugins directory on each standalone or clustered server. The database must be re-started on each server to pick up new procedures.

Kotlin

If we can write a stored procedure in Java, why can’t we write them in Kotlin as well? As you might know, Kotlin is a new programming language from JetBrains.

Kotlin compiles to JVM bytecode, JavaScript or native code. It is very interesting for people who work with Java today. But it could appeal to all programmers who use a garbage collected runtime, including people who currently use Scala, Go, Python, Ruby, and JavaScript. Kotlin has simple and a clean an concise syntax.

Here is a class in Java and Kotlin:

Java Code:

public class Bean {
  private final String name;
  private final int age;  public Bean(String n, int a) {
    name = n;
    age = a;
  }  public String getName() {
    return name;
  }  public int getAge() {
    return age;
  }
}

Kotlin Code:

class Bean(val name: String, val age: Int)

As you can see, the syntax is lean and intuitive. Kotlin programs can use all existing Java frameworks and libraries, even advanced ones. It can be learned in a few hours by simply reading the language reference docs. Kotlin imposes no run-time overhead. The language has strong commercial support from several established company.

Consequently our mission has been determined: we need to write a stored procedure for Neo4j in Kotlin.

Use Case: NLP Analysis of User Feedback

In our last research on Trinity Engine for the Dorium project, we needed to gather social impact indicators from various research centers. That helped us to rate social economic impacts of certain sustainable projects in Africa. But also informed about volunteer voting and observation system in different areas. In order to analyze comments and other user’s response texts, we needed to run some NLP as basis for other techniques.

If you are familiar with Natural Language Processing and working with Neo4j, you can see that Cypher makes language processing easier than others, for example with following code you can create a chain of words:

WITH split(“there is a different good leader in world”,” “) as wordsUNWIND range(0,size(words)-2) as iMERGE (w1:Word {name:words[i]})
MERGE (w2:Word {name:words[i+1]})CREATE (w1)-[:NEXT]->(w2);

It creates following graph in the database :

It’s been working fine until there is no repeated word in sentence, if we change sentence to the following, there is some mistake in word graph as you can see in following picture:

WITH split(“there is a different good leader and bad leader in world.”,” “) as wordsUNWIND range(0,size(words)-2) as iMERGE (w1:Word {name:words[i]})
MERGE (w2:Word {name:words[i+1]})CREATE (w1)-[:NEXT]->(w2);

The command just adds one leader word to graph and it’s because of the MERGE command in our query which guarantees uniqueness per label and property-key.

MERGE either matches existing nodes and binds them, or it creates new structures and binds it. It’s like a combination of MATCH and CREATE that additionally allow you to specify what happens, if the data was matched or created.

The SHA256 User Defined Procedure and Function

So, we need to add a unique identifier to our words. Therefor they will be always be created, even with the same name. For this reason, I’ll create a procedure in Neo4j that accept a String and converts it into a SHA256 Hash.

Kotlin Code for our procedure & function

class Sha256Hash {class HashResult(dx: String){
  @JvmField var result:String = dx
}fun sha256(data: String): String {
    val bytes = data.toByteArray()
    val md = MessageDigest.getInstance(“SHA-256”)
    val digest = md.digest(bytes)
    return digest.fold(“”, { str, it -> str + “%02x”.format(it) })
}@Procedure(name=”dor.sha256")
@Description(“Convert data from string to SHA256 String”)
fun sha256Proc(@Name(“data”) data: String): Stream<HashResult> {
   return Stream.of(HashResult(sha256(data)))
}@UserFunction(name=”dor.sha256")
@Description(“Convert data from string to SHA256 String”)
fun sha256Function(@Name(“data”) data: String): String {
   return sha256(data)
}}

So after building the project with Maven and copying the resulting JAR file to the Neo4j plugin folder we can use our new features like this:

Example Usage

We can then later use it with:

call dor.sha256(“1000”)

You’ll get following hash as result for 1000 :

40510175845988f13f6162ed8526f0b09f73384467fa855e1e79b44a56562a58

We can provide our code either as procedure which returns a stream of data, or as a function which returns a single value an can be used in any computation in Cypher.

return dor.sha256(“1000”)

Creating a continuous chain of words

WITH split(“there is a different good leader and bad leader in world.”,” “) as wordsUNWIND range(0,size(words)-2) as iMERGE (w1:Word {name:words[i],hash:dor.sha256(words[i]+i)})
MERGE (w2:Word {name:words[i+1],hash:dor.sha256(words[i+1]+(i+1))})CREATE (w1)-[:NEXT]->(w2);

And this is the result graph :

As you can see in query we use index of word in words array as entry with title for hashing and the index will make a hash unique so merge clause works correctly.

You can see entire code in following Github repository.