The GovTech Edu Data Science Team’s Role in Transforming Education in Indonesia

Andika Rachman Hakim
GovTech Edu
Published in
7 min readApr 18, 2024

Contributors: Figarri Keisha, Rizka Azmira, Andika Rachman Hakim

Gone are the days when data was only a collection of numbers and graphs understood only by experts. With the magic of statistics intertwined with programming, data is transforming into a powerful tool capable of reshaping how we teach and learn. Given the increasing data availability and advancements in computing power, the combination of math, statistics, and programming (referred to as “data science”) can revolutionize how we support education in Indonesia. By analyzing large datasets, analysts may uncover hidden valuable and actionable insights that support informed-decision-making processes, enhancing teaching experience and quality, which is projected later to impact students’ learning outcomes positively. Recognizing this immense potential, the data science team at GovTech Edu is dedicated to creating data products to support the Indonesian education system. Those products are mainly visioned to elevate the operational burden, increase internal team members’ productivity, and comply with regulatory requirements.

Let’s discover how the data science team supports Education Transformation in Indonesia through (i) data and state of the art AI-technology; and (ii) Effective collaborations among team members across functions within the organization.

Cross-functional collaboration toward a better solution

A centralized Data Science (DS) team was formed within the Data Analytics function at GovTech Edu, which focuses on building data products that may extend our capability beyond routine data pipelining and in-depth analyses. The team was established to ensure a cohesive approach to project development and implementation using statistics, machine learning, or artificial intelligence from a broader view. This approach requires a collaborative work environment through regular synchronization meetings among cross-functional teams to achieve a holistic perspective. These discussion sessions allow for the exchange of ideas, knowledge sharing between technical and non-technical people (where sometimes we speak in different lingo and views), and goal alignment, ensuring that the teams are well-informed and able to contribute effectively to the product.

The initiatives

Referring to our Product Development Principle (Figure 1), we actively engaged in all stages, developing data science products ranging from back-office support to user-facing ones. This section highlights three DS initiatives in production, following a thorough internal quality assessment.

Figure 1. Product Development Cycle Framework at GovTech Edu.

An automated-validation tool for teachers’ portfolio submission

Introducing Platform Merdeka Mengajar (PMM), one of the digital products to support Indonesia’s Education by empowering teachers and principal schools in their educational endeavors. Built to support the implementation of Kurikulum Merdeka, Aksi Nyata (one of the features in PMM) serves as a crucial step within the independent coaching framework, aimed at enhancing teacher competencies by submitting their practical demonstrations aligned with the platform’s curated (teachers’ portfolio).

Transitioning from the burdensome task of manual curation, which could encompass up to 500k documents in a month (resulted in unintended drawbacks, namely: prone to error and inconsistencies among the manual curators as the numbers of curators increase to validate the growing numbers), the first data science project revolutionized teacher submission document validation, by changing the game on how the Data Science team handle teacher document validations — making a leap from manual to automated. Starting from acquiring document tags from human validators, the Data Science team learned how to reproduce manual validator decisions through automation with the machine learning approach, speeding up the validation process much faster. Leveraging machine learning algorithms, the project aimed to enhance accuracy and efficiency in document-checking tasks. The team employed a few competitive models (from single logistic regression to ensemble learnings) and objectively selected the best model with the highest model coverage with a low error rate. Starting by rolling out the model with coverage of 40% with a false negative of 5% and a false positive of 27%, we performed several iterations to tune the parameters and handle the inconsistency of the tags resulting from human-validators. As of Q1 2024, the project has processed up to 80% of document submissions (model coverage) in a month with false positives of 2%.

For comparison, manually validating a similar number of documents would require ten months (assuming manual validator capability of up to 40k documents per month). Document automation with machine learning impact has set a new document processing and verification standard.

Figure 2. Automated-validation tool process from discovery to delivery

Content recommendations for teachers in PMM

One way to boost teacher engagement with PMM content is through implementing a content recommendation system. The team was tasked with such a recommendation system, where a blank slate situation arised, i.e. neither initial data of user-content interaction; nor baseline data was unavailable. To overcome the challenge, a collaborative discussion among qualitative researchers, UI designers, content specialists, product managers, and engineers was initiated to formulate a clear problem statement and navigate the complexity of the cold start problem. Further, leveraging insights from previous projects, particularly the need to improve content for different user segments, our team opted to develop a semi-personalized recommendation system to tackle these initial challenges.

Prior to building the core recommendation system engine, a critical pre-work phase involving machine learning for free text matchmaking was implemented using Natural language processing (NLP) techniques to address the divergence between user segmentation needs and the available content on PMM. From cluster texts of similar meanings to iterative refinements by the content specialist, we mapped each content that resonates with the unique needs and preferences of the teacher segment. Next, a ranker system was developed for content and teacher segments using a tree-based model to further match user intention on the content. Before deploying the model, the Data Science team refers the model’s results to other teams and educates them on how it will operate and interact with users in PMM. Lastly, to assess the performance and evaluate the model result, an A/B testing was implemented prior to rolling out the model to all users. The test yielded promising results, showing a positive correlation between users’ engagement and the ranking system.

Metadata Documentation Generator for Internal Database

Apart from utilizing conventional machine learning algorithms, the team leverages the power of generative AI (GenAI). The Metadata Documentation Generator (abbreviated as Matador) project was initiated as a novel innovation in the field of (meta-)data management, particularly addressing labor-intensive, error-prone, and monotonous tasks. What makes Matador preeminent is automating the metadata generation process embedded with the local contexts on One-Data-MoECRT program (Satu-Data Pendidikan Program), by using a Large Language Model (LLM) while keeping consistencies and standardization with adequate accuracy. When building this project, the strategic foundation was to minimize risks in GenAI development by choosing a low-risk system that benefits from automation while managing potential risks from GenAI advancements.

The team approached the problem by performing an in-depth exploration of multiple models, ranging from open-source options to proprietary ones. Through tons of iterations and gathering appropriate context for the LLM, the team searched for the best model with a precise prompt to achieve a high-accuracy result and align with the existing metadata documentation and standardization. We were grateful to have tremendous support from our Data Analyst peers, which plays an important role in the curation process and give feedback into the process through specific context that needs to be embedded in the LLM. Once the result was sufficiently stable, the model could infer the metadata across hundreds to thousands of tables in our databases. The project reduced the time needed to create detailed metadata documentation, bringing it down to under a minute for each table. This achievement facilitates time savings and significantly improves the accuracy and consistency of the metadata documents produced.

Figure 3. Metadata Governance for transactional data at GovTech Edu

Beyond the Data — Becoming Thought Partners for Product Solutions

Reflecting on the journey in building product solutions for the last year, the Data Science team extends beyond a technical team that only works with (product-)analytics. A pivotal role as a bridge between the technical and non-technical views in the organization. By putting ourselves in both aspects, the Data Science team facilitates a seamless flow of insights and innovations, ensuring that (i) the proposed-solutions are robust in technicality and (ii) tailored to meet user needs by adhering to regulations regarding data security, ensuring compliance and trust.

About the Writers

Andika Rachman Hakim is a seasoned professional with a diverse background spanning data science, artificial intelligence, and engineering. He contributes his experience and knowledge as a Senior Data Scientist at GovTech Edu, where he continues to drive innovation and excellence in technology. Beginning his career as a Reservoir Engineer in the energy industry, he developed strong analytical skills before transitioning to Artificial Intelligence & Data Consultant and Data Scientist Manager roles in the finance sector. Armed with a Master’s degree in Applied Computational Science and Engineering from Imperial College London, Andika combines his technical expertise with strategic vision to drive impactful outcomes.

Figarri Keisha is a data scientist at GovTech Edu with a wealth of experience in the tech industry. Prior to his current role at GovTech Edu, he served as a Data Science instructor in tech academy companies, where he imparted his expertise to aspiring professionals. His practical experience was further enriched during his tenure as a data scientist at one of the Indonesian e-commerce companies. With a passion for data-driven insights and a commitment to education, Figarri continues to drive innovation and excellence in his role at GovTech Edu.

Rizka Azmira is a skilled data scientist with a diverse background in the tech industry. Before her current role at GovTech Edu, she began her journey as a Data Analyst intern and later as a Data Analyst, accumulating valuable experience in data analytics. Progressing in her career, she took on roles such as Business Intelligence Analyst and Data Scientist, where she applied her expertise to generate strategic insights and foster innovation. With a solid foundation in data analysis and a dedication to excellence, Rizka remains committed to driving meaningful contributions to the data and technology at GovTech Edu.

--

--