Experimentation Culture at GovTech Edu: How We Ran The A/B Testing for Government-Led Teacher App

Published in

GovTech Edu

7 min readJul 14, 2022

Contributors : Bagoes Rahmat Widiarso and Bhaskoro Muthohar

The rise of tech-companies in the last decades makes an astounding increase in the amount of data. Companies — ranging from small to large scale, whether engaged in sales or services — may have used the data later to boost their performances, for instance in the product development phase as well as marketing campaign initiatives. Executing product development and/or marketing campaigns solely by gut feeling, is a risky thing! It’s like rolling the dice where we expect the number 6 to occur, but it turns out to be a number 2.

All efforts and costs will be ineffective when the result differs from our prior expectation. One of the common ways to minimizing the unintended results from the changing that we made on the product development or marketing campaigns is to run experimentation or (in the tech-companies) it is commonly known as A/B testing, where we have a safe-and controlled-environment to test our belief (or hypothesis). The readers may have heard tons of applications of A/B testing in private tech-companies, but its application in government / public sectors apps might be still limited.

In this article, we share the very first implementation of A/B testing for strategic product development in one of the digital applications owned by the Ministry of Education, Culture, Research and Technology (MoECRT) in Indonesia. Before we continue further about the A/B testing process, first, let us brief you with information regarding the digital product. The name of the digital product is Merdeka Mengajar Platform (we refer to it as PMM). Within the platform, there is a product called Pelatihan Mandiri. This product is a training program created by experts and consists of a variety of topics and materials that are short and practical, as such the teachers can study and benefit from it. The MoCERT uses this product as a tool to accelerate the understanding of Kurikulum Merdeka (a new curriculum that the ministry has introduced) for 142k+ schools across Indonesia.

Background of Experiment

Pelatihan Mandiri aims to encourage a teacher to complete the topics, which one topic contains several learning modules. Each learning module has 3 learning materials: video, quiz, and reflection. The teacher is expected to be able to do a posttest when the teacher has completed all the materials in one module. After the teacher has passed the posttest in all modules on the topic, the teacher will be able to submit their aksi-nyata. At this stage, the teacher can be said to have completed 1 learning topic. To give a better understanding, please see figure 1.

Figure 2. Funnel to Complete the Topic of Learning

Given a pre-defined learning journey, a challenge has arised. There are a lot of bounces in some funnels. It means that most teachers don’t finish their learning until submitting aksi-nyata. This has impacted the minimum approachable target from the implementation of Kurikulum Merdeka.

The findings from qualitative research suggested that teachers are more likely to be inspired if they see their peers’ or other teachers’ work. In the context of Pelatihan Mandiri, we first assume if teachers are exposed to other teachers’ aksi-nyata, they will be more interested in learning and submitting the aksi-nyata. The team formulated a hypothesis that may affect the submission of aksi-nyata as follows.

Hypothesis:

H0: Checking other’s aksi-nyata is not encouraging teachers to draft and submit aksi-nyata
H1: Checking other’s aksi-nyata is encouraging more teachers to draft and submit aksi-nyata

Experimental Design

The experiment was divided into 50% control (existing topic page without list of other teachers’ aksi-nyata) and 50% variant (new topic page that allows a teacher to see other teachers’ aksi-nyata). We have used a Bayesian analysis instead of the conventional frequentist method as we have learned that the frequentist method has several weaknesses in the analysis and decision-making process. Table 1 shows the differences between frequentist and bayesian methods.

Table 1. The Difference between Frequentist and Bayesian methods

In a Bayesian, we can meaningfully talk about the probability that the true conversion rate lies in a given range, and that probability proves our knowledge of the value based on prior information. Although for some people the calculations behind Bayesian are a bit complicated and look like a black box, the results of the analysis prove to be very good. The common process of Bayesian can be seen in figure 2.

So, what is needed to run bayesian A/B testing? The Bayesian framework requires information such as prior, likelihood, and posterior.

Prior = The probability of A being true, of the probability of something before we see or observe the data.
Likelihood = The probability of observing new information outside prior knowledge.
Posterior = The probability combination between prior and likelihood.

Figure 4. Bayesian Likelihood, Prior, and Posterior

To understand even better, see the image in figure 3. Both prior, likelihood, and posterior in bayesian often follow beta distribution which results in the parameter (α, β), where α represents how many times we observe an event we care about, such as getting 5 out 10 teachers complete the module, and β represents how many time the event we care about did not happen. The relationship of the three is Beta(α-posterior, β-posterior) = Beta(α-prior + α-likelihood, β-prior + β-likelihood).

For example, Blue is the prior distribution of existing conditions in Pelatihan Mandiri with 5 out 10 teachers converted to complete modules; thus, the distribution follows Beta(5, 5), resulting in a probability distribution center of 0.45. By running the experiment, we get Red which represents the likelihood of 7 out of 10 teachers converted to complete modules; hence, the likelihood distribution follows Beta(7, 3), resulting in the probability distribution center of 0.75.

By knowing the information from prior and likelihood, we know that the posterior distribution in Green will follow Beta(6, 4), resulting in a probability distribution center of 0.6. It means that the experiment most likely will convert 6 out 10 teachers to finally complete their learning module if the experiment is implemented later.

Analysis and Result

In the experiment, we separated the analysis for new users (users who have never started the material at all when assigned a variant or control during the experiment), and for old users who have started the material before the experiment started. The confidence plot and probability plot can be seen respectively in figure 4 and 5, while the summary result can be seen in table 2. To note, we label the data for the new page as A (in red), and the old page as B (in green).

Figure 5. Confidence Plot (a-left) New Users Group, and (b-right) Old Users Group

Figure 6. Probability Plot (a-left) New User Group, and (b-right) Old User Group

**Table 2**. Summary Result of AB Testing Analysis

The posterior distribution of the new page in figure 4 was leaning towards more to the right side over the old page. It indicated that the average conversion value of the new page is higher than the old page. From table 2, we learned that the new page has the ability to convert teachers to finally submit their aksi-nyata by 1.43% higher than the old page for the new user group. On another occasion, in the old user group, the new page generated a 0.41% better conversion rate than the old page. Based on these results, first, we got information that the new page converts teachers better to submit aksi-nyata.

The probability plot said the same thing. We can derive information from figure 5 that the probability of the new page will perform better than the old page by more than 90% both for new and old user groups. This is also supported by the fact that the expected loss for implementing the new page over the old one was close to zero. By that, we have also calculated the number of potential improvements if we implement the new page by dividing the delta conversion by the mean conversion rate of the old page for both user groups. The result can also be seen in table 2! We learned that by implementing the new page, the potential improvement for the new user group would be 32.06%, and over 100% for the old group.

By using the analytical evidence generated, we finally agreed to launch the new page! And on to the next question, what is the state of the metrics after implementing the new page? Of course, we have to measure the performance afterward. Overall, the number of aksi-nyata generated by teachers has increased by 590% up to 2 months after implementing the new page.

Key Takeaway

At the time of writing this article, Platform Merdeka Mengajar has been operational nationwide for just five months after its launch in February 2022. In that five months, the platform has been downloaded more than 1.3 million times, and more than 943 thousand teachers have logged in and used the platform to assist their teaching and learning. We believe that by continuously applying data-driven practices such as the A/B testing outlined in this article, we will shape and maximize the platform’s impact in its interactions with the teachers and ultimately transform the educational practices in Indonesia irreversibly.

Experimentation Culture at GovTech Edu: How We Ran The A/B Testing for Government-Led Teacher App

Written by GovTech Edu