Active Learning, Wellies and Daisies

Rustem Zakiev
hydrosphere.io
Published in
4 min readJun 22, 2018

Besides plenty of practical benefits, we do love machine learning (ML) for the fun it brings via magic-looking performance of solutions found and implemented for tasks accompanying different aspects of an ML-model lifecycle. Having over half a century of history of development in Moore’s law pace, artificial intelligence (AI) and machine learning still have untapped depths to refine for performance and impressive findings. Active and proactive learning are a miraculous techniques to engage AI to train an AI.

Active Learning (AL) is a model supervised training technique. Starting from the beginning — basically model training iteration consists of feeding a labelled data to inputs, assessing an output (prediction or inference) and correcting model’s weights and biases. Then iteration is repeated with the next labelled data sample. After a number of iterations model is evaluated by feeding it with some unlabelled data from a separate pool, prepared beforehand along with the training data pool. Then it goes to work into production or back to another set of training iterations.

The key to a model good enough for real world tasks is in the quality of training data pool (we will get to it further) and in the quality of labelling.

Labelling data samples requires a domain expertise, and the more complicated domain — the deeper (and more expensive) expertise required and more time per sample it takes. For example in biomolecular research labelling one sample takes up to 15 seconds. Imagine the headache labelling thousands of samples for one training iteration. Active Learning comes to help here, targeting to minimise time of domain expert presence in the training loop. The main point is using so called query strategies to pass to an expert most informative (hence most useful for training) samples to label. Those query strategies are backed up by statistical or AI algorithms, like, e.g., query by a committee of specially trained AI models.

The other point of application intelligent techniques in the Active Learning loop is model validation step — similar to query strategies techniques, called selecting strategies, are used to select most salient unlabelled samples from validation data pool to training data pool, sustaining latter one’s quality.

Long story short Active Learning is a supervised learning of AI with AI (and an expert in the loop). See exhibit 1 for a schematic AL cycle display.

Exhibit 1: regular Active Learning cycle

Ok, done with training, let’s move to the production. There’s a hell of a story with that too, but let us say we’ve made it through that — the model performs in a production. Like in life of humans, training is not capable of perfect preparation to real life environment. Training cases scope is limited and relies on skilled choice of a tutor, which is ML-engineer in our case. E.g. you’ve trained a natural language processor (NLP). To what degree is it prepared to deal with cultural specifics? How about Cockney rhyming?

For more complicated tasks like financial fraud detection or early-stage disease recognition such kind of occasional novelties (in fact they are anomalies to an expected production conditions) are invisible to a naked eye causing unpredictable inferences on the output of application. That inference is basically faulty and error spreads further to all the consequent processes and operations depending on ML application.

It would be totally nice if an ML application could react on that kind of situations and learn reactively and proactively preventing faults from occurring or, if occured — from repeating.

That would take capabilities for three functions:

  • detecting anomalies, novelties and concepts drifts and building high quality samples pool for retraining,
  • retraining (Active re-Learning),
  • redeploy a new version of model into production on the go.

Active [re]Learning today is well-known practice, so no troubles here. The situation with anomalies detection and bumpless redeploy on the go is a bit more complicated. But good news — solutions for those have emerged on the ML operations market. There is environments built to run on any infrastructural premises and allowing hot redeploy as many times a day as needed. Those environments might be seamlessly integrated with monitoring solutions implemented upon statistical methods and ML-based algorithms (e.g. GAN, MADE) to detect anomalies, concept drifts and faults in input and output production spaces. Voila — magic, and it looks like a free will is the last thing distincting human being from multi-layered perceptron ;)

Exhibit 2: Proactive Learning

Many thanks to Rinat Gareev for his contribution to this article.

References:

  1. Chen Y, et al., 2013, Applying active learning to high-throughput phenotyping algorithms for electronic health records data, J Am Med Inform Assoc 2013;20:e253–e259. doi:10.1136/amiajnl-2013–001945
  2. Lewis D, Catlett J, 1994, Heterogeneous uncertainty sampling for supervised learning. Proceedings of the Eleventh International Conference on Machine Learning;1994.
  3. Lukas Biewald, Staritng a Second Machine Learning Tools Company, Ten Years Later, 2018, [online], https://medium.com/@l2k/starting-a-second-machine-learning-tools-company-ten-years-later-21a40324d091
  4. Pushkarev S., 2018, “Monitoring AI with AI”, 2018, ODSC East 2018 speech, [online video], https://www.youtube.com/watch?v=4M_3oZlc2B4
  5. Smith K., Horvath P., 2014, Active Learning Strategies for Phenotypic Profiling of High-Content Screens — Journal of Biomolecular Screening 2014, Vol. 19(5) 685–695

--

--