How Interface Design and Visualization Tools Can Support Machine Teaching

Ross Young
The Startup
Published in
7 min readJan 10, 2021

Data labelling for machine learning brings its own set of challenges and misconceptions (see our previous blog in the series). We needed a better approach to labelling data that values human expertise and manages costs: a machine teaching approach. In this blog, we discuss collaborative processes and tools that can enhance the machine teaching role, with a focus on bias detection and building trust in machine learning models.

A collaborative process from the start

As a data collection and annotation team, we realized that early identification of bias in datasets was central to machine teaching. Detecting bias is difficult, and can be hard to quantify, yet we knew that an essential part of doing so was to have a clear understanding of what a trained model was expected to do in production.

So, to get started, we would work directly with AI practitioners and subject matter experts to define the problem and a set of guidelines. This typically involved collecting and labelling some data at random. Labelling data at random allowed us to verify if assumptions about the dataset were true, and to validate that the guidelines for the task were clear enough to produce the desired training outcome.

Conversations with AI practitioners to discuss edge cases and noise in a dataset raises interesting questions that only can occur when working with data at a granular (data point) level. In our experience, a collaborative approach between machine teachers, AI practitioners and developers enabled either by close proximity (i.e., direct colleagues) or through collaborative systems for communication yielded better training datasets than outsourcing platforms often-used for labelling tasks (and we are not the only ones who think so).

Collaboration effectively allows us to observe patterns and capture insights during the collection or labelling process to early-on identify bias in the dataset and as such, potential bias in a model’s predictions before a deployment.

Active Learning: a tool for better machine teaching

Different tools can be used to help in bias detection efforts; one such machine teaching tool is active learning. Active learning is a technique that aims to reduce the amount of labelled data required when training machine learning models. Active learning utilizes machine teachers to assign labels to a proportion of data, and using those labels, will estimate the prediction uncertainty on the remaining data points in the dataset to identify the most uncertain data for label assignment and “query” the teacher for the label output. A number of techniques exist to surface the next best candidate for labelling (you can find the libraries we use here).

Active learning is great! There is lots of enthusiasm about it. Active Learning can be used for many types of problems (we recommend using it whenever possible); you can read an example of how we applied Bayesian Active Learning towards the real-world application of road defect detection. Active learning is advantageous in that we are able to focus only on assigning labels to the most valuable data for model training (where a high degree of uncertainty exists, i.e., where the model is confused), rather than labelling an entire dataset at random and incurring significant time costs (which often is a reason why companies outsource the data labelling tasks to cheaper labor markets). With the large requirements for labelled data when utilizing deep learning, active learning can empower specialized experts to be hired for teaching tasks and promotes (encourages even) fair compensation for the value of their contribution.

An active learning approach respects the intrinsic value of humans in the process; ensuring teachers contributions have the most impact for informing model predictions and reducing the requirement to annotate endless amounts of data with little or no value for training (and with little to no motivational reward for the teacher). Model predictions can be enabled within the labelling tool during teaching, enhancing the role of teachers, allowing them to assess (through visual inspection) trust in model outputs, and to identify observable bias or patterns in predictions. They can even be asked to correct predictions if necessary (a technique known as coactive learning). With the active learning method, metrics of a model’s performance can be evaluated during teaching (instead of just after). These performance metrics include accuracy, recall, precision (when applicable to the training method) and loss, thereby allowing the teacher and AI practitioner to observe when metrics plateau and limited value is gained by continuing to assign labels. The resulting effort from these combined approaches really begins to feel like true teaching.

Designing interfaces that support machine teaching

We believe the future of work (and in particular, the future of machine learning) relies on better human-machine interactions. Adopting a machine teaching approach will improve explainability of machine learning models, but can only be done if systems and interfaces are designed to empower machine teachers.

Simple interface changes can go a long way to support the teacher in detecting errors and bias in model output and predictions. When completing a text extraction task, we typically would use a pre-trained Optical Character Recognition (OCR) model to augment our role and quickly identify characters and words to be validated by the teacher. Surfacing the OCR model’s predictions in our labelling interface allowed us to visualize and quickly realize that our OCR model was struggling with certain characters such as 1, I, l, or |. We realized that our labelling interface was limiting our own detection of these errors, which reinforced errors in model output. With a simple change to font and a colour between numeric and alphabetical characters, we quickly were able to distinguish errors in the OCR model output, correct these errors, and build a more robust training dataset to improve model predictions.

Simply using statistical measures to assess uncertainty would likely never have identified the error, as a model can be very certain of an erroneous prediction. Thus, empowering teachers with interfaces that provide visual cues can enhance their own learning and observation styles. Teachers can more readily assess the quality of model predictions by first detecting errors, next to fix errors, and eventually to be proactive in reducing errors by understanding where and why these sources of error occur, so that they can communicate these insights to improve existing training datasets or build better new ones.

Embracing this new framework of how machine learning models are influenced by the teacher requires consideration of a task being performed and the target features to be extracted. This heavily relies on concepts. Concepts are not immutable.

Examples of annotated defects in a road surface.

Consider the example of an active learning task relating to road defect detection. The task involved identifying features of roads such as cracks, potholes, manhole covers and patches. The concept of a manhole cover seems relatively simple; often round and dark in colour, however we know from real-world examples that manhole covers can come in different shapes and sizes. A system that prompts for descriptors of concepts can help identify when a feature concept has changed (i.e., observation of only round manhole covers is now inaccurate when new observations of square manhole covers occurs).

Now, adding a new concept of a sewer grate can similarly impact model performance. Again from real-world experience, we know that sewer grates are often square, but also come in different shapes and sizes. Understanding these concepts for identifying features of a manhole cover versus a sewer grate can help us understand why class confusion could occur as a model is being trained. With the addition of a new concept, maybe we have previously (and now incorrectly) assigned labels to square “manhole covers” that were actually sewer grates, or maybe our dataset does not even contain enough samples of manhole covers to correctly distinguish these from sewer grates.

Allowing machine teachers to flag these insights moves away from top-down interactivity, empowering teachers and valuing their insightfulness to increase traceability for concept shift. Designing systems that can easily capture insights about the evolution of a concept are critical. This can lead to actionable insights in how data was collected (whether it is representative) and how data or concept drift has influenced existing labels in the dataset if new concepts are observed or introduced.

Visualization tools to make model predictions explainable

Combining model predictions, explainability elements and visualization tools is a topic of much recent research when surfacing model predictions in a machine learning product or a machine teaching tool. Many techniques are being developed to limit the opaqueness of how a model makes a prediction, for example, incorporating visualization tools of confidence within the interface.

One such method is the clustering of prediction results to provide a way to visualize confidence or class confusion and quickly observe where the source of confusion may lie. Other interactive approaches include data exploration that can account for the teacher’s familiarity (or lack thereof) with machine learning to mitigate cognitive biases that they may inject into the teaching process.

Especially when surfacing model predictions, it is critical to assess where the teacher can be overinfluenced by a model and overconfident in it’s predictions (known as the anchoring effect), which can lead to reinforced bias. Ultimately, explainable teaching systems will need to balance human need for cognition, their task knowledge and familiarity with machine learning with their autonomy to monitor, provide feedback and build trust in the learning model.

Processes and techniques for teaching matter. With tools like active learning, human insights and other methodologies (for example, influence functions to debug model predictions), we can more readily expose bias in a dataset, labelling noise, class confusion and imbalance, or even assess model fairness. Processes coupled with techniques are analogous to a toolkit.

With the right tools on hand, we can increase explainability and build trust in a model during teaching and training. Notably, to derive value from such an approach is only possible when a user interface is designed to enable the ‘job to be done’. The right user interface can go a long way towards ensuring our tools are right for the job.

--

--