Building a Questionnaire with Neo4j — part 1/3: One simple question

Stefan Dreverman
Aug 30 · 7 min read

How a simple question can lead to many others. Implementing a Questionnaire and don’t want to ask yourself too many questions?

Use a GraphDB, use Neo4j! (and read this :-)

The goal is to supply you with Questionnaire building blocks for your own application and the why behind the design so you can adapt it to your preferences.

Part 1 will handle the answering of one single question. Part 2 handles the answering of multiple question in a linear fashion. Part 3 shows dynamic question lists (where questions become available when answering others.)

For simplicity, it’s multiple choice questions only. The aspect of open answers (like numbers or free text) is not in scope.

And please use unique IDs instead of names if you’re building something, like Kahoot, where there’ll be many Questionnaires and Answers and Respondents.

In each part I’ll show you:

  • Metamodel — Which types of nodes (labels) and relations to create and how they are used.
  • How to create instances — Create and answer question(s) based on the metamodel
  • How to read the results — Extract information from the answered questions

Let’s start with answering a single question. So, we’ll have a Question that has one or more Answers. And we’ll call the person answering the Question a Respondent.

Meta model (1)

Question — This is the question being asked.

isAnswerTo— Relation from Answer to Question to indicate this is one of the possible answers to that Question.

Answer — A possible Answer to a Question.

(Extra requirement: One Answer can only belong to one Question, because we need to know which Question an Answer was given for (by a Respondent). If we define one Answer “Yes”, if tied to multiple Questions, we won’t know which Question the Respondent answered to.)

respondedWith — this records the fact that the Respondent has given an Answer

respondedTo— this records the fact that the Respondent has answered this Question (see below)

Respondent— Someone answering Questions. The reason I’ve not named this ‘User’ is existential: A User can be a Respondent, but can stop being a User. (see below for solutions) For this example, I will treat the Respondent as a unique User.

Further considerations for the metamodel (which I don’t demonstrate for the sake of length and transparency):

  • It’s possible to order the answers in a specific way by adding an order-property to the isAnswerTo relation or a priority-property to the Answer node itself. When querying the answers, you can ORDER BY that property.
  • Can the respondent answer a Question multiple times? No? Make sure that you link the Respondent to the User or create a property in the Respondent that contains the User-id. You can search for it before the Question is asked and not allow if found.
  • Yes, the relation respondedTo could be left out. It looks redundant. However, you’ll have to write a more expensive query to find out if a Respondent has answered a particular Question. If speed (or scale) is key, use this relation and save valuable milliseconds. Plus it is needed for part 2 :-)
  • Answer can have a score if there is no right or wrong in the answers. (see the Scoring Answers section below.)
  • If you already have several taxonomies of things (Animal, Country, Sport, etc…), you can let Answer be items from that taxonomy by labeling them as Answers as well (i.e. as both Animal and Answer). This will allow you to also traverse the existing taxonomy via the Questionnaire taxonomy!

Defining a question

Let’s define a Question “How are you feeling?”, with three answers: “Good”, “Okay” and “Not so good”:

CREATE (q:Question {name:'How are you feeling?'})
CREATE (:Answer {name:'Great'})-[:isAnswerTo]->(q)
CREATE (:Answer {name:'Okay'})-[:isAnswerTo]->(q)
CREATE (:Answer {name:'Not so good'})-[:isAnswerTo]->(q)

We don’t have any respondents yet, because we’ve just defined the question. The neat thing about a GraphDB is that we can set the answers by creating a relation between a Respondent and an Answer.

You can get the Answers to a Question by querying:

MATCH (q)-[:isAnswerTo]-(a) 
WHERE q.name='How are you feeling?' RETURN a.name as Answer

Alternatively, if you’ve used an order-property on the relation, you can order it by querying:

MATCH (q)-[iat:isAnswerTo]-(a) 
WHERE q.name='How are you feeling?' RETURN a.name as Answer
ORDER BY iat.order

Respondent answering a Question

Answering a question is done by finding the Respondent and linking it to the Answer given by that Respondent.

MATCH (r:Respondent)
WHERE r.name='Zack'

MATCH (q:Question)-[]-(a:Answer)
WHERE q.name='How are you feeling?' AND a.name='Great'
MERGE (r)-[:respondedWith]->(a)
MERGE (r)-[:respondedTo]->(q)

You would create the Zack node earlier (or use a node that represents a User), so we just have to look it up. (Creating it every time a question is answered means you’ll have one node for every question a Respondent anwers).

Repeat this for all Questions and Respondents. Below is some test data I’ve created using this query. Try re-creating it using the query as a template. (and create the Respondents first)

Showing results

The last step is to show results with the answers. You can do that with a query like this:

MATCH (n:Question)-[]-(a:Answer)-[rw:respondedWith]-(r:Respondent)
WHERE n.name='How are you feeling?'
RETURN a.name as Answer,
count(rw) as Frequency,
collect(r.name) as Respondents

This query returns the name of the Answer (for this question), the number of times this Answer has been given and a collection of all Respondent names.

Returning the collection of Respondents will not be very fruitful when you have 100’s or 1000’s of them. So drop the “collect” to work with the answer and frequency data. Also, there is no order in these answers. Ordering by Frequency is a good start. If you’ve implemented an order property in isAnswerTo, you can also use that.

Similarly, if you want to know how many answers were given by each Respondent, use:

MATCH (a:Answer)<-[:respondedWith]-(r:Respondent) 
RETURN r.name as Respondent, count(a) as Frequency

This will result in:

The result is quite boring with my four test subjects, but will get interesting very quickly when you hook this up to a real user base. It can also differ from the number of Questions answered if you allow Respondents to give multiple answers for a Question:

MATCH (q:Question)<-[:respondedTo]-(r:Respondent) 
RETURN r.name as Respondent, count(q) as Frequency

You can also find the other persons a Respondent shares the most equal answers with. If we want to know the persons Zack shares the most equal answers with:

MATCH (subject:Respondent)-[]->(a:Answer)<-[]-(r:Respondent) 
WHERE subject.name='Zack'
RETURN r.name as Respondent, count(a.name) as Count
ORDER BY Count DESC

This is useful for finding shared interests, sub-groups or to analyse results in a scientific study.

Scoring Answers

Some answers are not right or wrong, but have a score or weight. Scores give every answer a value, so you can ‘count’ every answer respectively. Answers can be scored by adding a score-property to Answer-nodes. The question “Where do you run your applications?” could have answers/scores like:

To add the score to an Answer, add a score-property to the CREATE statement when adding an Answer to a Question (q), like so:

CREATE (:Answer {name:'On dedicated hardware', score:7})-[:isAnswerTo]->(q)

Respondents can be added in the same way as explained earlier. The scoring does not change anything to this part of the model. I’ve used Bill and Garry in this example:

The Respondents can now be scored per question. The query below gives all scores for all respondents for one Question:

MATCH (n:Question)<-[:isAnswerTo]-(a:Answer)
<-[:respondedTo]-(r:Respondent)
WHERE n.name='Where do you run your applications?'
RETURN r.name as Respondent, a.score as Score

While the score of one Question can be enough, if you want to sum the Answer scores of multiple questions, forward to part 2 where I’ll add a (static) list of questions.

Conclusion — a simple question

As you can see, it’s relatively simple to create the queries for one question. There are already a good number of possibilities for extracting information from it. And the use of a GraphDB like Neo4j enables you to extract this information with little effort.

As you can see even with this small metamodel there are a couple of important choices to make. And it’s every important to get things right on this level, because they have an increasingly bigger impact when expanding the model.

Let’s now add another level and see what choices can be made and what impact it has. Read on in part 2.

If you have questions or remarks about the examples and queries above, please leave a comment so I can answer them for you and/or elaborate on that subject in another article.

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.

Stefan Dreverman

Written by

www.stefandreverman.nl — Freelance IT Architect. Father of Twins. Explorer. Trail runner.

Neo4j Developer Blog

Developer Content around Graph Databases, Neo4j, Cypher, Data Science, Graph Analytics, GraphQL and more.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade