How to Incorporate Ethics and Risk
into Your Machine Learning Development Process
Let’s be honest, building a machine learning system is hard. To make accurate predictions or classify data in useful ways, you’ve got to be deeply skilled at creating sophisticated algorithms and mathematical models. And it gets even harder when you add ethics into the equation. And as machine learning moves from the research lab to the enterprise, ethics and risk questions pop up at every step of the process.
If those questions go unaddressed, they can pose serious challenges for the companies building the systems. In the worst-case scenarios the result is a complete loss of trust, which can have a cataclysmic impact on any business. Complicating matters is the fact that you can’t apply ethics and governance after a system is built. It’s got to be part of the process from day one. That’s because choices that you make at each stage of the process will impact what you’re building further downstream.
The good news is that even small choices that take ethics and governance into consideration can have a big impact on trust further down the road. And, they can also save you a lot of time. The key is to get the right stakeholders — ranging from risk management to line of business leaders to diverse groups that can represent minority viewpoints that might otherwise go unnoticed — involved throughout the process to ensure the right issues are being addressed at the right times.
To help you understand ethics and risk in machine learning, let’s look at the six steps involved in developing an ML system. Along the way, we’ll call out what happens in each step and what the risk and ethics questions that arise are.
Ready? Let’s do this.
Like any initiative, machine learning projects start with ideation and project evaluation, including assessments of technical feasibility, scope, desired outcomes, and projected return on investment. Don’t underestimate the importance of this work. It’s wrong to think that being data-driven starts with finding insights in data.
On the contrary, it starts with the thinking that goes into defining a rigorous hypothesis that can be explored using mathematical models. Subject matter experts bring valuable information to the table and can import what they know into system and process design to get to results faster. Machine learning systems are tools to optimize against a set of defined outcomes. It’s up to humans to define which outcomes to optimize for. And that’s exactly where risk and ethical questions can arise.
The focus here is on the system’s front end, and specifically on creating the tangible interface consumers or internal users touch and interact with. Machine learning systems can be completely automated, where a model’s output automatically plugs into a website interface or app, or have a human in the loop. In the latter case, the system provides information to an internal user who then uses this information to help make a decision or sends feedback to help train the system and improve system performance.
Different architectures raise nuanced privacy, security, and ethics issues, leading to an array of important questions.
One question you’ll face in applying machine learning is whether you’ll use only first-party data or also include public or private third-party data in your system. Don’t fall into the trap of viewing this as a PII or no-PII question to satisfy compliance requirements. Privacy is contextual, and users get shocked when data you argue is public shows up in an unexpected context.
Many of the issues attributed to algorithmic bias start with data collection. If you’ve historically engaged with a certain demographic population, you will have more information about this group than other groups, skewing systems to perform better on well-represented populations. Solving this starts with the data, not the algorithms. The algorithms simply learn a mathematical function that does a good job mapping inputs to outputs.
Data processing is the step where you prepare data for use in algorithms. The core data privacy challenge relates to protecting privacy beyond PII. Focusing narrowly on PII, fields in databases like first and last names, social insurance numbers, or email addresses, is not sufficient to guarantee privacy. You have to expand risk to protect the possibility of a breach, even when a data set has been scrubbed of PII.
The core ethics issues relate to deciding what types of inferred features or profiles your organization feels are appropriate and identifying tightly correlated features in data sets that can hide discriminatory treatment.
In this step, machine learning engineers experiment with different algorithms to find the best algorithm for the job, train the model, and verify that the chosen model satisfies performance requirements (e.g., how accurate the model needs to be). Choosing the best model for a particular problem is not only a technical question of identifying the algorithm that performs best for the job.
Data and machine learning scientists should also consider business, ethical, and regulatory requirements when selecting algorithms, which means they should be asking numerous questions.
The final step in the machine learning pipeline is to put the model into production, to use the model to make guesses on new inputs. Many architectures include an API that calls the model for an output and an integration into the application or interface an end user or internal user engages with. Deployment entails choices about how frequently to retrain models — in sporadic batches (daily, weekly, monthly) or in real time. How often the model is updated invokes different privacy and security considerations.
Building a machine learning system is a tall order. But getting it right doesn’t just come down to questions of data science. There are also risk and ethical considerations that you need to take into account. The key is to adopt that mindset before you start building your system. That way you’ll save time and ultimately set yourself up for greater success. On the other hand, if you fail to do so or try to go back and think about these issues retroactively, and you might wind up paying a very high price.