Machine Learning in Social Security — does it affect inclusion or equity?
A case study
This September, KelaLab was invited to contribute to Data, AI and Public Sector Governance Conference. The conference was co-hosted by the United Nations Institute for Training and Research (UNITAR) and the DataLit project at FCAI. We were honored to contribute to the program and the discussions, as individuals from the public sector point-of-view. We presented a case study in the Workshop on Inclusion, Equity and Accountability. The workshop sparked a large interest with over 600 registrants to the main event and over 80 registrants to the workshop.
Our case study focused on client message classification. Kela receives some 4 million individual textual messages from clients annually. They arrive on several channels; each of these have some though-out purpose. For instance, a channel might be intended for supplying additional information to previously submitted claims. In that case, reading the client information during the claim evaluation is precisely what the client might expect Kela to do. Hypothetically, what happens in case the client’s messages lie outside of that intended scope? What if the client is unaware of this and communicates urgent need to Kela? In that kind of situation, it might be prudent to apply machine learning to automatically examine the message content, to identify whether the client’s case might be better addressed in some other input channel. How would we apply machine learning for that kind of case?
Machine Learning and the Labeling Process
Machine learning is often thought of as a pipeline process; data undergoes training, and the end result is a machine learning model. The most widespread methodology on applying machine learning is supervised learning, where a machine learning model learns to mimic the behavior of humans, assigned through a labeled dataset. The labels themselves are instructions on how the model should ideally behave. The most tangible benefit of the supervised learning methodology is the clarity in evaluating the model; since we know how the model should behave ideally, we can also evaluate it. The benefits of machine learning models include speed and — with correct application — consistency. Further, it enables the classification of several categories simultaneous.
The challenges in labeling
In a study at Kela, we found that labeling constitutes an important challenge. First, labels might be subjective: people might disagree on the specific category a message belongs to. If that is the case, then several people’s opinions are needed to understand whether we have phrased the questions for classification. Second, assigning labels might be cognitively straining, partially for the reasons. Labeling is often a voluntary work. In that case, it is important to reduce the overall burden as much as possible by having the right tool for the task. Labels are also invaluable in the evaluation process of machine learning; did we achieve the task that we intended? The importance of these have been discussed in recent publications.
Another common issue in applying machine learning is that model behavior may be challenging to explain well. Overcoming this challenge might require us to look at machine learning in a new light. As identified earlier, the input data is crucial in the overall process. What if we could control and limit the data that the model sees? Should we try to classify messages sentence by sentence to increase transparency which sentence the model relies on? Should we mask the inputs we wish the model to ignore in determining the model outcome?
Overall, methods and tools for evaluating machine learning models become increasingly important.
The human side of machine learning
In our case study at the workshop we suggested the possibility that machine learning could, when applied well, act as a method of improving the equity of clients; e.g. finding those people that have challenges in communicating their needs textually. As speech technologies and machine translation capabilities rapidly develop, we can also increase inclusion in our services. But there are also many risks and most often those are caused unintentionally by humans during the development process.
Discussing potential risks and opportunities is needed, both from a government employee and citizen point-of-view. To many people, it may come as a surprise how dependent reliable machine learning is on human efforts. Humans are very much needed in each part of the machine learning pipeline for the foreseeable future.
Co-written with Janne Mattila.