Distance-Learning in Low Bandwidth Areas — Concept

Published in

BurningDaylight

6 min readJul 25, 2020

Introduction

From a white paper on E-Learning by the Akhuwat Foundation:

According to UNICEF, 22.8 million children between the ages of 5–16 are out of school. This is 44% of the population in this age group. In Sindh alone, 6.4 million children are out of school. The scarcity of financial, logistic and human resources makes it even harder to enroll these children in school and provide them quality education. Moreover, an increasing population rate suggests only an increase in this number …
We are at a point of history where every resource and limitation points us in the same direction. E-Learning has never been more relevant...

Akhuwat is in the initial stages of setting up an E-Learning framework to facilitate learners all over Sindh free of cost. The scope of the project is massive, and among the many technical, academic and administrative challenges we face, servicing low-bandwidth areas is critical to its successful implementation.

Poor connection and low bandwidth is a common feature in outlying villages and towns. The rapid growth of 4G networks all over the Pakistan is promising, however many areas have been simply left out. It has not been economically viable for telcos to extend their 4G networks to these districts. And it is often in these outlying areas where there is the greatest need for quality education and resources. That said, even in regions covered by 4G there are bandwidth challenges due to the considerable requirements of video streaming.

Another major challenge to implementing Distance-Learning is maintaining student engagement. Among other features, the teacher needs to have a live connection to their students; to field questions organically, to actively engage students and adjust teaching approaches, to dynamically allocate time to topics, and most importantly to allow students to build relationships with the teacher. There is no shortage of online resources and videos for students to access — so the problem isn’t enough content. Simply availing hundreds of hours of content and gamified quizzes, and expecting a student of any age group to become a self-learner is unreasonable, especially in areas where there has never been a culture of learning and education. A responsive interface with the teacher is essential, and pre-recorded/built content will not cut it.

Finding good teachers is a challenge for all education institutions, even in the big cities. (The definition of ‘good teacher’ is intentionally left vague.) When it comes to outlying areas this challenge is overwhelming. In the city there aren’t that many good teachers to go around, and they cannot be paid enough to teach full-time in villages, and students cannot practically be brought to the cities. So the move is to hire these teachers in the city, aggregate them in one institution, and then find ways to project them into outlying areas where they are most needed. Perhaps with the right framework we can also enable these teachers to reach between 2x-10x as many students per class than they could via any brick-and-mortar school. The economies of scale offered by this approach, along with its potential to rapidly grow make it one of the best approaches to solving the education crises of Sindh.

Our first steps into this domain (with the traditional approach) will be in well connected areas with full-video links, perhaps to satellite schools within the city and then to schools in areas of reasonably good internet access. After this we will start to push into more difficult regions and attempt to leverage our relationships with telcos to establish better connectivity. However, at any point in this process there will always exist areas and schools where we do not have good enough connectivity for full-video, and so a low-bandwidth solution is needed. This is the problem which the following concept aims to resolve.

Concept

To reduce the bandwidth requirements of a live-video feed, instead of sending the complete picture we send just enough data to animate an avatar on the students side. The teacher’s facial expressions and movements will be encoded into small packets, and used to control the avatar like a marionette puppet. The process works in a much the same way as Snapchat Filters, or Animojis, and is reminiscent of this Black Mirror episode. It takes orders of magnitude less bandwidth to send these data packets than it does to send the full-video. Although we lose a lot of visual information, we can still retain the character and responsiveness of the live teacher — and hopefully the ability of the teacher to develop relationships with the students.

Using a Microsoft Kinect sensor, an iPhone, or even a standard webcam, along with some basic software we distill the teacher’s facial expressions and movements into a small set of numbers, and then send those numbers over the low-bandwidth connection. Those numbers are then received on the student side and used to animate the avatar.

It remains to be seen how much engagement an avatar can generate. I imagine this will depend largely on the age-group and how well we diversify our approach. I think there is great potential for this approach, especially with younger children.

A setup like this also presents a unique opportunity to collect exactly the kind of data needed to train an AI model for a teacher. Perhaps with enough hours of data and a good model we can generate an AI avatar-teacher (long-shot). At this point there would be almost no limits to scale.

You can check out a demonstration here. (Face capture works best 3ft away from the camera, and full body tracking works best when your entire body is in frame) Or you can download a demonstration of the concept here. Just unzip, navigate to the FirstDemonstration folder and click index.html.

Do the numbers check out?

Bandwidth (Bitrate) of 240-360p video at 30 FPS: 200 to 500 kbps
Bandwidth of audio alone: 100 to 300 kbps
Assuming we capture about 200+ data points and use them to represent the teacher at 30 fps the bandwidth is approximately: 50 kbps

(These numbers are not yet verified but serve the purposes for this discussion.)

There are minor computational expenses for the data capture and reconstruction, however with advances in computer vision and machine learning, we can implement this framework easily on even low-end computers or smartphones.

To put this data into context: using this method, we can conduct an online class anywhere with less than two phone-calls worth of bandwidth. That means we can literally run this framework with one phone-call to provide an audio connection, and another to provide the motion-capture data. So in principle we can reach almost any area where we can make a phone call. In practice actually using a phone-call based connection poses some challenges (think dial-up internet) but it’s interesting to think about nonetheless. The focus for now is implementation over low-bandwidth internet connections.

Final Word

The aim of this concept is simple: extend the distance-learning efforts already being made to include learners from low-bandwidth areas. The challenges for reaching and engaging with students at a distance still remain, but with systems like this we can work towards making sure areas with poor connectivity are not left behind. The hope is that with a system like this teachers can build and maintain bonds with students in the most difficult to reach areas of Sindh — alongside providing a quality learning experience.

I look forward to continuing the discussion and eventually developing prototype.

First Demonstration

You can see a demonstration here. Or you can download a copy from here, just unzip, navigate to the FirstDemonstration folder and click index.html. (Face capture works best 3ft away from the camera, and full body tracking works best when your entire body is in frame)