Could GraphQL bring better query capability to xAPI and Learning Record Stores?

So you believe the xAPI hype and insert a heap of statements into an Learning Record Store (LRS). The next sensible thing to do is to query the LRS. Unfortunately the LRS only provides an endpoint for retrieving full statements. You can’t run queries that return aggregate results. Yep not even simple counts of xAPI verbs and object use. Whole statements that match get returned. … and queries are written as example statements. The idea is that you must retrieve via paged datasets the xAPI statements (over a 24 hour period) and then perform your analytics magic somewhere else. I’ve recently been using GraphQL and have a few thoughts on how it may provide a better alternative to the current way statements in a LRS are queried.

I understand the reason for the simplicity of retrieval syntax in the xAPI spec. The LRS could be implemented in any suitable backend (eg Postgres, HBase, MongoDB, ElasticSearch, etc) and query syntax over JSON document structures might differ. MongoDB query syntax is quite different from SQL. I also understand that with huge amounts of data in the LRS, you probably would want to either create a data warehouse (with data cubes) or cache the results of your custom analytics. However, I also think that more flexibility is needed in allowing statements to be retrieved. Key examples include retrieving related statements (such as all the threads in a discussion, responses to a quiz question, etc), flexible query operators (i.e., is, not, lt, lte, gt, gte, contains, icontains, startswith, endswith, iendswith, like, ilike), return selected JSON fields (the full statement may not be required) and yes basic aggregate counts. The major design features of GraphQL are described here:

Sure the xAPI spec can be extended to include basic reporting. Then the current vendors and open source solutions can also be extended each individually. This would take time and money — both for the development of the spec and by the vendors. An easier option would be to adopt GraphQL. I think GraphQL has most of what is needed to move forward. GraphQL was developed by Facebook as a way to allow developers to control how data is returned — very useful in service oriented architectures. A number of implementations for Postgres and MongoDB are also starting to become available.

Nodal provides a great GraphQL Playground that illustrates some of the functionality I described above (even though they only have a partial implementation). There are examples that illustrate that only the specified fields are returned, query operator usage (such as is, not, lt, lte, gt, gte, contains, icontains, startswith, endswith, iendswith, like, ilike) and the ability to return structured/related statements (eg the Query to retrieve all Users their Threads, and Posts in Threads).

GraphQL also has stricter typing. xAPI extensions at the moment don’t have enforced typing. If xAPI moved to JSON-LD, it would be a step forward in determining data types in an automated manner for analytics. The JSON-LD typing to describe statements extensions may then easily map to the stricter typing required by GraphQL.

What about simple aggregate counts? Well in the design of the custom schema for xAPI GraphQL, summary fields could be added and code written in the resolve methods to perform the custom queries.

Anyway its just an idea I had for awhile which was reinforced when I saw the Nodal GraphQL Playground. I encourage you to have a play in the Nodal GraphQL Playground and provide feedback.