How does Wiley empower instructors with assignment length estimates? (Part 2)

Anitha Sivasubramanian
Knerd
Published in
3 min readSep 29, 2022

Bayesian updating in nutshell (using Conjugate priors) — the preliminaries

In Part 1, we described the benefits, challenges, and constraints of using data-informed estimates for assignment lengths to show instructors creating their courses. Given the challenges outlined in part 1, an online Bayesian approach seemed to be the right one, so that we could update the estimates in real time based on new data.

Bayesian updating revolves around prior, likelihood, posterior and posterior predictive. Basically, the idea is to start with our prior beliefs (how long we generally think assignments might take, before we’ve seen specific data) , and as we see more and more data, update our beliefs based on what we see, and update the estimates accordingly.

What is the actual data generating process here?

Clearly, our data generating model is quite bespoke. However, this question can be answered by looking at the distribution of the completion time and number of questions.

In case of the completion time, the histogram of duration is right skewed. By taking the logarithm, we could conclude that this variable is approximately log-normally distributed.

The next variable of interest is the discrete number of questions required to complete an LO. The actual number of questions required for a student depends on the student completing the minimum work required for the LO as well as achieving a proficiency level above some threshold.

Typically, you cannot complete an LO without answering at least 4 questions. By carefully reviewing the pool of possible discrete distributions especially the ones that have conjugate priors, we were able to reject pretty much all of them except for the Negative Binomial (NB), and the Geometric, which is a special case of NB.

The Negative Binomial is the distribution of the number of failures experienced before a fixed number of successes, in IID Bernoulli trials. So, if we say the fixed number of successes is just the minimum work, then often a student will have done minimum work + number of incorrect questions total on a learning objective, which is what is expected from a NB random variable. So, our objective is to model the extra number of questions a student needs to attempt beyond the minimum work.

Assuming the above data-generating distribution for each data point gives us the likelihood. The product of likelihood and prior gives us the posterior density.

The posterior-predictive distribution is the probability density for a new data point, given all the previously seen data.

Why Conjugate priors?

If the posterior distribution is in the same probability distribution as the prior probability distribution, then the prior is called a conjugate prior. Please refer the table of conjugate prior distributions in Conjugate prior — Wikipedia . When you use such a conjugate prior and you have a ‘closed-form’ equation to go from prior to posterior given some new observations, you can then continue to update your posterior with additional data using the same kind of calculation, making it efficient to calculate the posterior iteratively without needing to ‘solve for’ or approximate the posterior with an iterative algorithm.

On a peak week from 2021, we had over 200 thousand new events to incorporate into our models, so it makes a big difference in performance to be able to use a calculation to update the model rather than a complex optimization algorithm. The chosen closed form solution supported by conjugate prior helps us to handle these updates efficiently.

In the next part, we will talk about conjugate prior model for time and number of questions.

--

--