A Guide for Active Learning in Computer Vision

Igor Susmelj
7 min readFeb 3, 2023

Learn how active learning can be used to build a data flywheel where only data is getting labeled and used for training that actually matters.

Before jumping right into the steps to select data using active learning, we will have a look at what active learning actually is.

What is Active Learning?

Active learning is a research field in machine learning (ML) that aims to reduce costs and time to build new machine learning solutions by querying the next data for your pipeline in an intelligent manner. When developing new AI solutions and working with unstructured data such as images, audio or text, we often require the data to be annotated by humans before we can use them for training our models. This data annotation process is very time-consuming and expensive. It’s typically one of the biggest bottlenecks in modern ML teams.

With active learning you can create a feedback loop where you iterate between annotation, training and selection. Using good selection algorithms you can reduce the amount of data that is required to train a model to reach a desired accuracy.

In our journey at Lightly, we talked to over 200 ML teams in the computer vision field. Most don’t use sophisticated active learning strategies yet and rely on random selection. Selecting data randomly has the advantage that it does not change the distribution of the data. However, this is also under the premise that the input data matches the distribution you actually care about.

Different Active Learning Approaches

--

--

Igor Susmelj

Co-founder at Lightly | Writer at Medium about Computer Vision, Startups and Machine Learning