Understanding Active Learning

Amogh Mahapatra
4 min readMar 31, 2024

--

Scheherazade. Grades. Surprises.

There once lived a Sultan, powerful yet bitter. Bitterness driven from his first wife’s betrayal, making him lose faith in the idea of marriage. Henceforth, he would marry many times, but execute his wife the following morning, every single time.

Then arrives Scheherazade, our protagonist, whom the Sultan marries with the same intention.

But that night, he hears a story so engaging and so complex that he couldn’t stop listening. There comes the storyteller’s ingenuity; she leaves the story hanging at an unpredictable point. It’s what we call a cliffhanger. The Sultan just has to know what happens next. So, he spares her life for one more night. This keeps happening, night after night for as long as “one thousand and one nights”, finally culminating happily with the Sultan moving on from his wounds towards forgiveness.

Even to this day, this technique of pausing the narrative at a cliffhanger is used and reused by movies and streaming shows, and we fall for it every single time. We binge and binge on episodes after episodes, sequels after sequels, because we just have to know what happens next.

We are all Sultans, listening to Scheherazade, dressed up as a digital device, unflinchingly willing to turn off that device, the second we lose interest.

Consider a math class with three students: Adam, Benz, and Cam. Adam is a high-flyer, consistently scoring A or A+ grades. Cam hovers around the B or B- mark, displaying little change. Benz, on the other hand, is an unpredictable wildcard, his grades swinging from B- to A+.

Then comes, the semester exams.

In the absence of any information, whose grades would an examiner be most curious to know? That’s an easy one, of course, it’s Benz.

Now, say we know the outcomes. They were wild.

Imagine all students unexpectedly fail, each receiving a F. Whose failure is the most shocking? Adam’s, of course. Given his track record, it starts prompting speculation about the exam’s difficulty.

If, alternatively, everyone achieves an A+, Cam’s top grade would be the most astonishing, suggesting the exam might have been too easy.

Let me throw in, one last curveball.

These exams were packed with complex proofs and detailed steps, yet grades are posted within two minutes after completion. This unusual speed of grading shifts the focus, stirring doubts about the evaluation’s credibility. Was the evaluation conducted fairly and impartially?

In each of these scenarios, surprises demand our attention, urging us to dig deeper. If the grades had mirrored past patterns — Adam acing, Cam middling, and Benz in between — the outcomes wouldn’t have raised any eyebrows.

Surprises guide us. They tell us if we should be foraging for more information. A world without surprises is a world without information. The atomic unit of information theory, hence, is the expression of the degree of surprise encoded in a given set of bits, technically called entropy.

The way we train models in machine learning is by indicating our expectations associated with a data point with what are called labels. Say a model is being trained to tell apart images of fruits and vegetables. It will be presented with pictures of potatoes, spinach, and carrots, labelled as veggies, and pictures of apples, mangoes, and bananas, labelled as fruits. In order for it to generalize beyond what was provided, we expect it to pick up some signals that discern fruits from veggies visually. But, what would the model do, if now presented with a tomato? Is that visually a fruit or a veggie? This is the boundary, the point of high entropy, the model can’t get out of it, own its own.

What’s the obvious answer here, for say a human? When in doubt, ask.

Active learning is building this ability in the model learning cycle. When faced with maximum possible doubt, ask the expert (in most cases a human) to clarify. It’s a simple idea, yet it saves time and resources, rather than spending tons of cash on getting labels all the time, we only do so when the machine is in a state of doubt.

In “Arabian Nights”, the storyteller uses the technique of cliffhangers at the end of each night to keep the Sultan’s interest alive, thereby saving her life for another day. Each story, often unexpected, maintains a high level of entropy in terms of storytelling. The Sultan, eager to know what happens next, is kept in a state of suspense and uncertainty, and needs to seek the expert the night after, so he keeps her alive, just as machine-learning models trained through active learning seek out high-entropy examples to gain the most informative data.

References:

  1. Arabian Nights
  2. Entropy: information theory

--

--