Best Practices for Offering Educational Data Analytics

I am on two panels over the next week at IMS Global Europe and IMS Global November Quarterly in Atlanta. In both of these I am presenting some of the learnings I have had in offering large scale data analytics of learner behavior to a widespread research community. Specifically ACT’s OpenEd product offers classroom assessment and video instruction for millions of students. We perform a large amount of analysis on this data for many purposes. One of the things we do with this data is determine the efficacy of instructional resources based on subsequent classroom assessments. But, since we are part of an organization like ACT with many researchers of various aspects of learning, we have strong demand for information on all user actions. In the process, I have (painfully) learned quite a bit about offering large scale data analytics capabilities for a diverse analyst base. With that, here are my top five learnings:

  1. Use a standard that puts a stake in the ground for semantics of learner events

We used IMS Global Caliper 1.1 which has a robust definition of video consumption (MediaEvent) and assessment actions (AssessmentItemEvent, AssessmentEvent, GradeEvent). Using data include aggregating events among differing systems (like multiple systems that perform assessment) and sharing the event data across system boundaries (other analytic tools that don’t know how learning events are represented in your system). An example of this is using assessment data capture from multiple systems, all expressed in Caliper, to rate the efficacy of instructional resources.

The second big benefit is the fact that a standard, as the development of it was informed by many users and organizations, helps you anticipate metadata on events that you may not have thought of. It’s critical that you anticipate as much as possible all the attributes of events that are likely to become of interest later on. It is usually difficult to impossible to engineer that data later on into the event stream after it has happened.

[Note: XAPI is commonly viewed as an alternative to Caliper. Like Caliper, XAPI expresses learning events as “triples” of actor-verb-object expressed as JSON-LD. But XAPI does not put a stake in the ground about what learning events are important and what attributes of those events should be stored. So its not really relevant to this discussion, given the value that the standard is giving us: clearly defined semantics for learning events.]

2. Anticipate what your use cases for query and analysis will be.

Expanding on the last point, you can pretty much assume that if you don’t look at the use cases for query on events you will later discover that you didn’t capture the right information. You should enumerate before you start capturing events just how that event data will be analyzed, both individual events and aggregate summarizations by learner or learning object.

3. In this analysis use case, denormalization is GOOD. Err on the side of MORE of it, not less.

For computer scientists and software developers this may seem unexpected. Many of us were taught to create properly normalized database schemas for easier data maintenance and smaller overall storage requirements (think lots of joins to external tables instead of replicating information in say, a transaction table). However, your ex post facto analysis of events is usually going to require a lot of lookups to get additional data: Looking up a learner to get more attributes on them or retrieving a learning object (such as an assessment or a video) to get more attributes on it. You will want to use principle 2. above and store whatever attributes you care about for learners or learning objects right in the Caliper event. Failure to do so will at best explode your processing time. At worst, it may often be impossible to reconstruct the full set of metadata attributes.

Caliper is extensible both in the attributes that are available (and for that matter the events that are expressed). So if you anticipating needing additional attributes on events, just add them to the extensions map property.

4. Define a set of concrete query endpoints to retrieve individual and aggregate data on usage.

Unfortunately these query endpoints aren’t yet available in the IMS Caliper standard (stay tuned on this subject). But you will want to allow your data analysis “customers” get to the data easily without your assistance. Define some query endpoints and make them available to data analysis consumers. An example of a set of query operations is our open source Callisto LRS with Caliper query capability.

5. Don’t let data analysis customers access your database. Use the new abstraction layer of Caliper instead.

If you do find that the data set isn’t quite sufficient to provide what your customer needs resist the request or temptation to just provide access to the database (live or a snapshot) to get the questions answer. Spend the time reprocessing the data to create more robustly described Caliper events. Data analysis “customers” are never going to understand your application well enough to do this themselves. Directing them to the, potentially updated and enhanced, Caliper representation of the learner event stream is much more likely to product accurate results and results consistent with previous analysis.

That’s our list of best practices for getting your learner data analyzed broadly, effectively, and accurately. These are all derived from real and recent experience. Hopefully these guidelines can help you start deriving value from your educational data capture and analysis efforts much faster and less painfully.