GSoC Weekly Update: My Project Journey with Sugar Labs

Qixiang Wang
3 min readJun 10, 2024

--

Introduction

As part of my Google Summer of Code (GSoC) project with Sugar Labs, I’ve experienced an exciting and productive first coding period. My project, “Integrating an AI Chatbot into the Chat Activity,” aims to enhance both the interactivity and educational value of the Chat Activity by incorporating an AI-driven chatbot. You can check my project here. In this initial phase, my focus has been on selecting suitable open-source models and datasets, as well as fine-tuning the model to align with the project’s specific requirements.

Initial architecture design of this project

The initial architecture of this project consists of two main components: the Chat Manager and the Chat Engine. For the Chat Engine, I have chosen the Llama-3 series as the initial model.

1. Chat Manager

The Chat Manager is responsible for handling the overall management of the chat activities. Its primary functions include:

  • Input Handling: Handling user inputs.
  • Session Management: Maintaining the state of chat sessions, including history and context.

2. Chat Engine

The Chat Engine is the core component that processes the input and generates responses. For this, I have selected meta-llama/Meta-Llama-3-8B from the Llama-3 series due to its high performance and advanced capabilities. The main responsibilities of the Chat Engine include:

  • Natural Language Processing: Understanding and processing user inputs.
  • Response Generation: Generating contextually appropriate and informative responses.
  • Model Fine-Tuning: Adapting the model to improve performance based on specific project requirements.

By leveraging the strengths of Llama-3, the Chat Engine can deliver high-quality interactions, making the Chat Activity more engaging and educational.

Initial architecture design

WHY Llama-3?

I have chosen the Llama-3 series as the initial model for my project. For this fine-tuning phase, I used meta-llama/Meta-Llama-3-8B. Here are some reasons for choosing Llama-3:

High Performance: Llama-3 excels in multiple natural language processing tasks, featuring advanced generation and understanding capabilities, making it suitable for developing high-quality AI chatbots.

Open Source: Llama-3 is an open-source model, freely accessible and usable, allowing for easy customization and optimization to meet the specific needs of the project.

Community Support: Llama-3 has an active developer community that provides a wealth of resources and support, helping to resolve issues encountered during development.

First round fine-tuning datasets

In the first round, I focused on finding some datasets related to children’s common knowledge and stories. I found these on HuggingFace:

  1. A big dataset about children’s stories: https://huggingface.co/datasets/ajibawa-2023/Children-Stories-Collection
  2. Another kind of storytelling dataset: https://huggingface.co/datasets/Anuvathan/Kids_Storyteller_GPT4
  3. Some kid’s questions dataset: https://huggingface.co/datasets/Gurminder/kids_questions

In the subsequent fine-tuning phase, I utilized the Unsloth library to efficiently fine-tune the model and compared the results before and after fine-tuning using several test questions.

Results comparison

You can view the fine-tuned results in this sheet. It is evident that after this round of fine-tuning, the model has demonstrated improved capabilities in answering common knowledge questions for children and in generating children’s stories.

Thanks for checking out this post.

--

--