CLEVR graph: A dataset for graph based reasoning

We’re excited to announce the release of a new dataset. CLEVR graph aims to help further research into machine reasoning on graph datasets. It contains a set of questions and answers about transport network graphs.

An example generated graph and question-answer pairs

To successfully answer the questions, an AI has to sufficiently understand English and transport network graphs (similar to the London Underground) to perform reasoning tasks.

We believe that the world’s information is inherently graph-based, and that progress towards Artificial General Intelligence will require agents that can reason over graphs.

The dataset

CLEVR graph, as its name hints, takes homage from the CLEVR dataset. It is a set of 10,000 graphs, each with an english language question and an answer to that question.

There are eleven different question types currently:

  • How many stations are between {Station} and {Station}?
  • Which lines is {Station} on?
  • How many lines is {Station} on?
  • How clean is {Station}?
  • Are {Station} and {Station} on the same line?
  • Which stations does {Line} pass through?
  • How many architecture styles does {Line} pass through?
  • How many {Architecture} stations are on the {Line} line?
  • Which line has the most {Architecture} stations?
  • What’s the nearest station to {Station} with disabled access?
  • Which {Architecture} station is beside the {Cleanliness} station with {Music} music?

Also included is a functional program representation of each question as well as a Cypher query representation. There is code included to load each graph into Neo4j, verifying each Cypher query gives the expected result.

We hope the dataset helps motivate further research into graph question answering. In upcoming months we plan to release ML models that can answer the questions.

Thanks to Andrew Jefferson and Ashwath Salimath for their help with this project.

Example graph

An example auto-generated transit map. Grey points are interchanges