Building a Questionnaire with Neo4j — part 2/3: A static list

Stefan D
10 min readSep 6, 2019

Because one question leads to another, part 2 of this article will elaborate on creating a static list of questions. I’ll show the metamodel, creation of instances and some queries to show results, just like in part 1.

Please read Building a Questionnaire in Neo4 — Part 1/3 for the choices made and queries to answer one Question.

The progress of a Respondent

The progress of the Respondent needs to be stored and tracked in order to know where the Respondent is in the List of Questions. It’s important to know that the result of each Question needs to be set before the next Question is answered. This requires the application that executes your questionnaire to store the answers between questions.

(Note that not storing answers between questions will result in a stateful front-end with a list of questions both answered and unanswered. I’m aiming at an event-driven and stateless implementation. This requires all answers to be saved as soon as known.)

the respondedTo- relation

One direction is to go with what we already have. In this case, the ‘next’ Question to be answered is the first Question in the list that does not have a respondedTo-relation for this Respondent. However, requires a relation to be created for every question answered.

a ‘currentQuestion’-relation

Since we’re now using a list and are allowed to create any relation we want, there is another way to remember the progress of a Respondent: a pointer. This pointer would point to the ‘current question’. So let’s name it currentQuestion. When Bill answers the question that currentQuestion points to, the pointer can be removed and the next question (if any) would receive the currentQuestion-pointer.

This deprecates a lot of respondedTo-relations, saving a lot of space when scaling. It also changes the game a little with regard to the query to run when storing an Answer.

Either way, you’ll probably end up with one of the above. I’ll illustrate both so you can see the differences in approach.

Choosing a structure

When creating a list of questions, there are several aspects to consider (and are reflected in the design):

  • Is order important?
  • Are questions (like a decision tree) dependent on each other? If so, the list is no longer static. This is covered in part 3.
  • Can Questions be reused?

For now, let’s go for a static list with finite number of non-reusable questions in which order is important. In a graph this could be reflected by structures like: relations and a linked list… and a combination of relations + linked list. Let’s look at their structures.

Relations

In design-time, this structure sets the order by the order-property in the hasQ(uestion) relation.

Note: The order-property can also be put into the Question-node, which makes it easier to fetch. It’s less normalized, because order is not a direct property of the Question; it’s a property of the Question-as-part-of-a-list. Still, for optimization sake it’s better to put it in the Question-node.

This approach is simple, straightforward and most importantly: it works. And it supports ordered and unordered lists. There’s still something missing: The structure is just not flexible enough. It does what it does, but dynamic lists are hard to achieve with this structure. And we need those in part 3… so: Next! :-)

Linked List

…yep, it litteraly creates a list of Questions. This has downsides too.

This list forces an ordered list. It is also difficult to fetch all unanswered questions. However, the structure works for its purpose and it can be extended to a dynamic list, which is what we need later on.

… so we need a structure that is like a linked list for the dynamic list, but we also want to be able to write easy and fast queries…

Relations + Linked List

No surprisingly, there is a structure that leverages the positive sides from both patterns:

Its advantages:

  • It’s easy (and fast) to fetch which Questions belong to the ListOfQuestions: Get all questions that are connected to the ListOfQuestions with a hasQ-relation. Unanswered questions can also be fetched quickly.
  • The order-property on the relation is not needed, because the order is determined through the next relations (…if specified: Without the next-relation, every Question is still connected to the ListOfQuestions. So ordered and unordered lists are supported. Plus ordered lists can also be treated as-if unordered.

Metamodel (2)

With the structure determined, the metamodel is extended to look like this:

Only the next-relation, hasQ-relation and ListOfQuestions-type-node have been added. You can use the currentQuestion-relation if you go for the pointer solution. Adding these few elements brings new dimensions in answering questions and showing results.

Creating a ListOfQuestions

This requires you to create other questions than the one in part 1, so you can repeat the query.

First, create the ListOfQuestions:

CREATE (loq:ListOfQuestions {name:"Cloud computing questionnaire"})

Then, relate any number of questions to this ListOfQuestions. Use the Neo4j “IN” statement to fetch all Questions to create the relation for:

MATCH (loq:ListOfQuestions) 
WHERE loq.name="Cloud computing questionnaire"
WITH loq
MATCH (q:Question)
WHERE q.name IN ["Where do you run your applications?","Do you use multiple cloud providers?","Are you planning on moving to a cloud provider?"]
WITH q, loq
MERGE (loq)-[:hasQ]->(q)

Starting to understand why I recommended using an ID-property instead of a name-property as an identifier in part 1? Good!

As you can see, I’ve created two other questions. Putting them in a particular order requires fetching the two questions and creating the relations:

MATCH (q1),(q2) where q1.name='Where do you run your applications?' AND q2.name='Are you planning on moving to a cloud provider?' 
MERGE (q1)-[:next]->(q2)
Full questionnaire

This is a very basic version of linking nodes which gets elaborate and costly when scaling. Lucky for us, Andrew Bowman made an excellent example for working with linked lists on the Neo4j website. This blog is very helpful to learn how to navigate, link, remove or fetch a number of items in a linked list. You’ll definitely need this when working with linked lists. Very useful stuff, a must read and a must implement!

Answering a ListOfQuestions: Queries for the respondedTo-relation

The first question — When a respondent starts the answering of a ListOfQuestions, we need to get the first Question to be answered. This is the Question in the list that has no next-relation pointing toward it:

MATCH (loq:ListOfQuestions)-[:hasQ]->(q:Question)
WHERE loq.name='Cloud computing questionnaire'
AND NOT (:Question)-[:next]->(q)
RETURN q

This will always work. If you want to assert that the Respondent has not answered any question yet:

MATCH (r:Respondent)
WHERE r.name='Anne'
WITH r
MATCH (loq:ListOfQuestions)-[:hasQ]->(q:Question)
WHERE loq.name='Cloud computing questionnaire'
AND NOT (:Question)-[:next]->(q)
AND NOT (r)-[:respondedTo]->(q)
RETURN q

In this example the first Question will only be returned if our Respondent Anne has NOT answered it. …and it will return null in case of a partially answered ListofQuestions.

The first unanswered question — If the above query returns a null, you can try and find the first question that is unanswered:

MATCH (r:Respondent)
WHERE r.name='Bill'
WITH r
MATCH (loq:ListOfQuestions)-[:hasQ]->(oldQ:Question)-[:next]->(q:Question)
WHERE loq.name='Cloud computing questionnaire'
AND NOT (r)-[:respondedTo]->(q)
AND (r)-[:respondedTo]->(oldQ)
RETURN q

So, if Bill answered the n-th question but not the n+1-th question (where n<number-of-Questions-in-this-ListOfQuestions , it will be returned here.

If finding the first unanswered question is a need, try looking at the currentQuestion-relation which solves this problem way quicker than the respondedTo-relation.

Fetching the next question — Now that the first question is answered, retrieving the next Question is exactly the same as getting the first unanswered Question. Or, you can do it quick and dirty by simply retrieving the next question in the list, disregarding everything else:

MATCH (oldq)
WHERE oldq.name='Where do you run your applications?'
WITH oldq
MATCH (oldq:Question)-[:next]->(q:Question)
RETURN q

In the end of the list, you’ll get a null result because there’s no :next Question to be found.

Answering a ListOfQuestions: Queries for the currentQuestion-relation

Working with pointer, you’d first like to assert that the Respondent hasn’t started the list before. This query should return no answers:

MATCH (r:Respondent)-[:answeredWith]->(a)-[:isAnswerTo]->(q)<-[:hasQ]-(loq:ListOfQuestions)
WHERE r.name='Bill'
AND loq.name='Cloud computing questionnaire'
RETURN a

If so, the Respondent is a ‘first timer’. Next step is to set the state of ‘Respondent started answering the ListOfQuestions’-state by setting the currentQuestion-relation to the first question.

MATCH (r:Respondent)
WHERE r.name='Bill'
WITH r
MATCH (loq:ListOfQuestions)-[:hasQ]->(q:Question)
WHERE loq.name='Cloud computing questionnaire'
AND NOT (:Question)-[:next]->(q)
MERGE (r)-[:currentQuestion]->(q)

The currentQuestion — can be retrieved by querying:

MATCH (r:Respondent)-[:currentQuestion]->(q:Question)<-[:hasQ]-(loq:ListOfQuestions)
WHERE r.name='Bill'
AND loq.name='Cloud computing questionnaire'
return q

There is no need to discern between first/next/first unanswered question since the currentQuestion (if present!) always points towards the question to be answered.

Easy huh?! Wait. We’re not there yet. You are now required to do something more when storing an answered question: Move the currentQuestion-relation to the next question. So instead of setting the respondedTo-relation, we have to remove the old and set a new currentQuestion-relation:

MATCH (r:Respondent)-[cq:currentQuestion]->(q:Question)<-[:isAnswerTo]-(a:Answer)
OPTIONAL MATCH (q)-[:next]->(nextQ:Question)
WHERE r.name='Bill'
AND a.name='<fill in the answer for q here. Better yet, use IDs>'
MERGE (r)-[:respondedWith]->(a)
DELETE cq
WITH r,nextQ
MERGE (r)-[:currentQuestion]->(nextQ)

This query will:

  1. Optionally find a next Question (nextQ-node) (if any)
  2. Set the Answer for the Question (first MERGE)
  3. Delete the old currentQuestion-relation
  4. Set the nextQ-node (if one was found) as the current Question

For this to work when there is no next-relation, you will need to set an option in your neo4j.conf to prevent an error. This is because the OPTIONAL MATCH does not return a result at the last question. If so, we do want the rest of the query to be executed and stored anyway. You’ll need this setting:

cypher.lenient_create_relationship = true

Showing results

Since there are a lot of possibilities, I’ll show you a few to get started with:

Answers given by one Respondent

MATCH (loq:ListOfQuestions)-[:hasQ]->(q:Question)-[:isAnswerTo]-(a:Answer)<-[:respondedWith]-(r:Respondent)
WHERE loq.name='Cloud computing questionnaire'
AND r.name='Anne'
RETURN loq,q,a,r

If you want all answers for all respondents, leave out the entire line that starts with ‘AND’.

If your answers have scores, you can use pretty much the same query to return the total score for all respondents and put the highest score first:

MATCH (loq:ListOfQuestions)-[:hasQ]->(q:Question)-[:isAnswerTo]-(a:Answer)<-[:respondedWith]-(r:Respondent)
WHERE loq.name='Cloud computing questionnaire'
RETURN r.name as Resondent,sum(a.score) as Score
ORDER BY Score DESC

If this was a sales-oriented questionnaire, we can identify a Champion: Anne, the cloud-adept. We’ll need to look into the answers to see if we can upsell. We’ve also identified a target: Bill. His passion for the mainframe needs to be updated so it’s compatible with the cloud. Let’s send him the ‘mainframes run in the cloud, with benefits’-whitepaper.

Frequency per answer given

This is more an analysis of your questions/answers, but nonetheless usable. It can be used to see if the frequency (per question) for particular answers is very high or zero.

MATCH (loq:ListOfQuestions)-[:hasQ]->(q:Question)-[:isAnswerTo]-(a:Answer)<-[rw:respondedWith]-(r:Respondent)
WHERE loq.name='Cloud computing questionnaire'
RETURN q.name as Question,a.name as Answer,count(rw) as Frequency

Conclusion

This blog has only scratched the surface of what can be done with a questionnaire in a Graph database like Neo4j. There are a lot of queries you can think of, like returning the most answered Answer for each Question or aggregated results spanning multiple questions. The queries are also easy to write and fast to execute. A huge benefit over relational DBs!

The programmers among us have also noticed that something we normally code (pointers, linked lists) in a programming language can also be programmed into a database with queries. I’ve also just scratched the surface here, because with a smart information model/architecture and a good GraphDB, application logic could be entirely moved to the database. Leaving only a generic application layer. Maybe something for another blog… :-)

The questionnaire can be taken a level deeper: Dynamic question lists. Part 3 will cover the structure, creation and extraction of results for it.

If you have questions or remarks about the examples and queries above, please leave a comment so I can answer them for you and/or elaborate on that subject in another article.

--

--

Stefan D

Freelance IT Architect. Father of Twins. Innovator. Endurance sports addict.