Adding Q&A Features to Your Knowledge Graph in 3 Simple Steps

Fanghua (Joshua) Yu
9 min readJan 18, 2023

--

Combine OpenAI GPT-3 with Neo4j Graph Database

by Joshua Yu

Photo of the Edinburgh Street in an early winter morning by the Author

Now, along with its huge success within the first 1.5 months of release, ChatGPT has become a name known by almost everyone. In fact, the magic hand behind it, the so-called GPT-3 (now evolved into GPT-3.5) large language model has been published since mid 2020. My friend Sixing Huang, together with a small group of tech geeks from Neo4j have experimented the Doctor.ai project based on GPT-3 and a knowldge graph of medical records stored in Neo4j, which lets doctors and patients to ask questions using (multiple) narual language(s) about the health check records and medical history.

The original article can be found here.

Today I am going to use some of the work created by the project, and apply it to a less boring subject i.e. movies and stars to show case how to bring Q&A capability to a knowledge graph in 3 simple steps.

Step #1. Prepare for what are needed

  1. A running Neo4j database instance storing the Movies Knolwdge Graph. Don’t have one, or don’t know how to get one yet? Not a problem! Let’s just get one from the AuraDB and it’s FREE.

Aura is Neo4j’s fully managed cloud service. It has 2 products: AuraDB, the zero-admin, always-on graph database for cloud developers, and AuraDS, the AuraDB + Graph Data Science extension. For our project, we will get a free AuraDB instance from the website, which is good enough for this project.

Choose Movies from the AuraDB homepage

By following the instructions on the screen, our Movies graph should be ready within a few minutes:

A running graph database in AuraDB

By clicking >_ Query button at the right side, we will launch the Neo4j Browser in the default web browser, either Chrome or Edge is well supported. Then we need to enter the password generated at the time when the database was initially requested.

Once connected, we are done with the graph database. Keep this Neo4j Browser window open as we are about to use it later.

2. An OpenAI API key

Once you sign up OpenAI, an API key is automatically generated. We will need this API key for the required NLU tasks.

3. Download the MoviesBot application

The MoviesBot application can be found from Github. You will need a node.js server to run it.

Step #2. Design the Q&A Stories

  1. The Movies Graph Model

The Movies graph contains about 100+ persons, movies and their relationships. Inside of the Neo4j Browser window, let us see how the data model looks like, by entering and running this statement:

CALL db.schema.visualization
Show data model in the Movie Graph

Graph is the most simple and intuitive way to represent the real world with data. Basically, a graph has nodes / entities, shown as circles, and relationships, shown as arrows. Both nodes and relationships can have properties. From the screenshot above, we can see that there is Person node and Movie node, which are connected / related by several relationships e.g. ACTED_IN, DIRECTED, and etc. Person node has propereties of name and born, and Movie node has properties of title and tagline. There is also a property roles on relationship ACTED_IN, to store roles acted by a person in a movie.

With the above data model in mind, using Cypher, the query language of Neo4j, a query of this knowledge graph can be like:

// Who acted in the movie Top Gun?
MATCH (who:Person) -[r:ACTED_IN]-> (m:Movie)
WHERE m.title = 'Top Gun'
RETURN who.name AS name, r.roles AS roles;

I think the query itself is very well self-explained. Different from SQL, Cypher uses a pattern matching approach for querying, which is more expressive and powerful in looking for data through complex and deep relationships.

If you are interested in learning more about Cypher, I’d recommend this resource.

2. List the Possible Questions and Their Cypher Queries

For the small Movies knowledge graph, we can think of some questions people like to ask. For example:

1) questions of a property value of single node or relationship

When was Tom Hanks born?
What is the tagline of movie Top Gun?
What was the role Tom Hanks acted in the movie Sleepless in Seattle?

2) questions of direct relationship / connection of certain type

Who acted in the movie The Matrix?
Who directed the movie Top Gun?
Did Keanu Reeves act in the movie Cloud Atlas?

3) questions for complex relationships with filters

What are the movies Keanu Reeves acted in between 1990 and 2000?
Who acted in movie Sleepless in Seattle other than Tom Hanks?

4) questions that require calculations

How many movies Keanu Reeves acted in between 1990 and 2000?

5) some recommendations

I love Sleepless in Seattle, can you recommend a few other similar movies?

The list can go on and on, so let’s just stop here. The Cypher statements for the above questions will be given in the next section.

Step #3. Customize the application

After downloading the application, there are only 2 places to change for it to work.

1. The .env file

There is a sample .env file in the project folder, which is meant to be updated with correct details for it to be effective. It is the place we put all configuration items:

REACT_APP_NEO4JPASSWORD=**YOUR-AURADB-PASSWORD**
REACT_APP_NEO4JURI=neo4j+s://**YOUR-AURADB-ID**.databases.neo4j.io:7687
REACT_APP_NEO4JDATABASE=neo4j
REACT_APP_NEO4JUSER=neo4j
REACT_APP_OPENAI_API_KEY=**YOUR-OPENAI-API-KEY**

2. The moviesbot_gpt3.js file

This is the JS file for the MoviesBot application, which sits under src/component/ folder. Open it and locate at the line starting with let training = `, and then put questions and corresponding Cypher statements there:

      let training = `
#When was movie The Matrix released?
MATCH (m:Movie) WHERE m.title =~ '(?i)The Matrix' RETURN m.released AS year;

#What is the tagline of movie The Matrix?
MATCH (m:Movie) WHERE m.title =~ '(?i)The Matrix' RETURN m.tagline AS tagline;

#When was keanu reeves born?
MATCH (p:Person) WHERE p.name =~ '(?i)keanu reeves' RETURN p.born AS year;

#What are the 3 most recent movies that Keanu Reeves has acted in?
MATCH (p:Person) -[r:ACTED_IN]-> (m:Movie)
WHERE p.name =~ '(?i)Keanu Reeves'
RETURN m.title AS title, m.released AS year
ORDER BY m.released DESC
LIMIT 3;

#Who has acted the role Neo in movie The Matrix?
MATCH (p0:Person) -[r:ACTED_IN]-> (m:Movie)
WHERE m.title =~ '(?i)The Matrix'
WITH p0.name AS who, r.roles AS roles
UNWIND roles AS role
WITH who WHERE role =~ '(?i)Neo'
RETURN who;

#Did Keanu Reeves act in the movie Cloud Atlas?
MATCH (p:Person) WHERE p.name =~ '(?i)Keanu Reeves'
MATCH (m:Movie) WHERE m.title =~ '(?i)cloud atlas'
RETURN exists((p) -[:ACTED_IN]-> (m)) AS answer;

#Who were acting in movie The Matrix?
MATCH (p:Person) -[r:ACTED_IN]-> (m:Movie)
WHERE m.title =~ '(?i)The Matrix'
RETURN p.name + ' acted as ' + r.roles[0] AS answer;

#Who directed movie The Matrix?
MATCH (p:Person) -[r:DIRECTED]-> (m:Movie)
WHERE m.title =~ '(?i)The Matrix'
RETURN p.name AS answer;

#Was Keanu Reeves the director of movie Cloud Atlas?
MATCH (p:Person) WHERE p.name =~ '(?i)Keanu Reeves'
MATCH (m:Movie) WHERE m.title =~ '(?i)cloud atlas'
RETURN exists((p) -[:DIRECTED]-> (m)) AS answer;

#Who wrote movie Speed Racer?
MATCH (p:Person) -[:WROTE]-> (m:Movie) WHERE m.title =~ '(?i)Speed Racer'
RETURN p.name AS who;

#What are the movies Keanu Reeves acted in between 1990 and 2000?
MATCH (p:Person) -[:ACTED_IN]-> (m:Movie)
WHERE p.name =~ '(?i)Keanu Reeves'
AND m.released >= 1990 AND m.released <= 2000
RETURN m.title AS answer;


#I love Sleepless in Seattle, can you recommend a few other similar movies?
MATCH (m:Movie) <-[:ACTED_IN]- (p)
WHERE m.title =~ '(?i)Sleepless in Seattle'
WITH m, collect(p.name) AS actorsCol
MATCH (m1:Movie) <-[:ACTED_IN]- (p1) WHERE m1 <> m
WITH m1.title AS title, collect(p1.name) AS actorsCol1, actorsCol
WITH title, toFloat(size(apoc.coll.intersection(actorsCol, actorsCol1))) / size(apoc.coll.union(actorsCol, actorsCol1)) AS similarity
RETURN title ORDER BY similarity DESC LIMIT 3;

#Who acted in movie Sleepless in Seattle other than Tom Hanks?
MATCH (p0:Person) -[:ACTED_IN]-> (m:Movie) <-[r:ACTED_IN]- (p)
WHERE m.title =~ '(?i)Sleepless in Seattle' AND p0.name =~ '(?)Tom Hanks'
RETURN p.name + ' acting as ' + r.roles[0] AS answer;

#`;

Most of the Cypher statements should be quite clear to the people who don’t even have any expereinces in it, and some can be tricky e.g. the one for movie recommendation uses the concept of Jaccard Similarity to look for similar movies that have most number of common actors!

Regardless, this wouldn’t bother GPT-3 that much. All we did here is quite similar to preparing a training set for a translation task, and GPT-3 is intelligent enough to learn and apply the right Cypher statement to a question through the API call below:

let query = training + search + "\n"

const response = await openai.createCompletion("davinci", {
prompt: query,
temperature: 0,
max_tokens: 150,
top_p: 1.0,
frequency_penalty: 0.0,
presence_penalty: 0.0,
stop: ["#", ";"],
});

See It in Action!

Now let’s run it. Open a commandline window, navigate to the root folder of the application and run:

npm start

If React hasn’t been installed, you need to install it. Otherwise, a browser window will be launched with URL http://localhost:3000 and you should be able to see the main UI of the MoviesBot:

MoviesBot: You can type, or click the microphone icon to speak out your question

Try to ask the questions prepared above with different person names and/or movie titles, and see what answers are given. You can even try to ask in a slightly different way, for example, the question prepared is:

When was Kneanu Reeves born?

If you ask:

In which year Keanu Reeves was born?

You should still be able to get the correct answer. So the GPT-3 model does understand the two questions are about the same thing!

The application logs all questions asked and Cypher statement generated and executed in the Console of your browser:

The Console in Chrome Developer Tool Window has questions asked and Cypher statements generated and executed.

You may find out GPT-3 isn’t always correct in translating questions into Cypher statement. This can be fixed by providing more samples, and possibly changing how Cypher is writen. We will discuss this subject in the future articles.

Further Discussion

So far, I hope you have been feeligng amazed by the power and simplicity of combining GPT-3 language model with knowledge graph to create a Q&A chatbot. In fact, there are heaps of other great stuff GPT-3 can do together with a knowledge graph.

Since GPT-3 is good at handling text of many kinds of languages, why not just let it learn from the raw articles on movies and answer questions? Why bother having the knowledge graph in Neo4j?

To answer this, we need to think about what GPT-3, as well as other NLP models really are. Their knowledge of human languages come from MASSIVE amount of contents trained and learned over time. It is intelligent enough now to give answers to almost ANY question in its own way, of which human beings can’t really tell whether it was done by a machine or not. However, it won’t tell you how it does the job, or sources of the answer (references). In most of the closed domain question answering use cases like a chatbot for an enterprise, the answers are expected to be authentic and trustworthy.

By providing samples of questions in natural language and corresponding Cypher statements, we can see exactly how each and every question is understood and answer obtained, which makes AI predictable and explainable.

Of course, this is just the beginning. GPT-3 has shown us the unprecedented achievements that AI can make in a lot of spaces, language is just one of them. It is for sure going to change how human-beings live and work.

So stay tuned!

--

--

Fanghua (Joshua) Yu

I believe our lives become more meaningful when we are connected, so is data. Happy to connect and share: https://www.linkedin.com/in/joshuayu/