Adopting Ethics in Artificial Intelligence and Machine Learning: 6 Best Practices Practitioners Should Follow
Machine learning and artificial intelligence are no longer cutting edge technologies limited to academic research labs or government-funded projects. Anyone with access to the internet and the ability to call an API can leverage complex models trained on millions of data points to identify objects in photos, to translate text between languages, or to even find the right talent for an organization. It is important to remember that just because they are commonplace, doesn’t make these technologies harmless. Here I will dive into certain applications of these methods that can lead to dire, real-world consequences.
A large part of the commercialization of artificial intelligence and machine learning is attributed to the development of cloud technologies and advances in distributed computing and hardware. It is now relatively easy to operationalize models and incorporate them, at a reasonable cost, into commercial applications to automate or facilitate decision-making. These algorithms now impact several key business decisions across industries, including insurance risk, patient outcomes in healthcare, marketing in consumer products, and supply chain optimization in retail among others.
While the mathematics to do these analyses have existed for decades, it is only in the modern age that we see the ability to rapidly deploy and apply these principles with technology; however, this technology-driven approach and burst of commercial innovation has not been accompanied by the ethical standards needed to mitigate potential risks associated with unethical practices.
Why does it matter?
Rapid growth in the applied artificial intelligence and machine learning space has led to an almost ‘wild-west’ approach to the usage of these cutting-edge techniques in the real world. We have seen a number of incidents where operationalized models have had unintended consequences.
In 2015, Google’s image classification algorithm was called out for being racist and is the same algorithm used in the Cloud Vision API, which is available to commercial users. Similarly, in 2019, an MIT study highlighted racial bias with Amazon’s Rekognition, indicating minimal attention given to ethics in the last four years and ongoing problems stemming from this.
These incidents highlight the inherent difficulty in obtaining unbiased training data for machine learning algorithms, since there are certain demographics more likely to have access to the internet, attending events and taking photos, and labeling data on applications. Geoffrey Hinton, the “godfather” of AI, said, “we need to ensure that AI is not only for the rich” — meaning we must recognize and account for the fact that datasets may not be representative of the whole.
Image classification and recognition algorithms are used in a variety of real-world applications, such as in transportation systems, security/video systems, in-store check-out, loyalty/marketing programs, insurance assessments, etc. If we utilize such biased results without even questioning them ethically and without mathematical and programmatic rigor, then would we not be reinforcing potentially racist, sexist, or classicist behaviors through technology? Therefore, it is important to understand the full context of the data before leveraging algorithms trained on it.
Another use case that comes to mind is based on an informal conversation I had with leading cancer researchers during a project at work. It had to do with the rising popularity of ancestral genetic testing leveraged by companies such as 23andMe. When consumers agree to do this kind of test, they give up a significant amount of data — their genetic blueprints — to organizations leveraging machine learning and artificial intelligence to parse out insights from the obtained genes and to eventually conduct predictions and simulations on this data.
A large set of genomic data can indicate any number of things of interest to different parties: Who is more likely to be diagnosed with cancer? Does this person have a variant which categorizes them as Jewish? What other traits could be derived from the genetic makeup of people? What other ailments are people who look like person X likely to succumb to? Did a person with this genetic makeup respond well to a specific treatment?
While there are a lot of positive outcomes to answering these questions, there are also inherent risks regarding privacy concerns, the malicious use of this data — especially in the realm of ethnic identification, and the potential fallout of doing this analysis incorrectly.
These are just a couple of use cases where a lack of robust methodology and ethical thought in the application of machine learning and artificial intelligence can have unintended, negative impacts. As the field continues to grow and mature, it will become more critical for practitioners to keep ethics top of mind.
What can we do?
There is no clear-cut roadmap that covers every use case for practitioners to ensure they are being completely ethically compliant. The fact is that this is a quickly growing field, where the methods and technologies are changing or advancing constantly. However, there are some best practices that we can follow to minimize the likelihood of falling into the trap of having a great model that technically works, but is full of bias or unintended gaps:
1. Always consider the actual impact of a given model. Work with subject matter experts.
At the onset of a project, always think through the objective and question you are trying to answer. While we may be the experts on the machine learning or artificial intelligence component, we are often not the industry experts. It is crucial to work iteratively with the domain experts to ensure we are thinking through all the nuances of the business and the impact we have on customers.
2. Understand the assumptions of a given model. Do not blindly use APIs or libraries.
Different models make different assumptions, whether it is the relationship between variables, the type of variables, or the distribution of residuals. Even if you are using a well-established library or API, it is important to check the underlying implementation. Not all random forests are the same.
3. Conduct thorough data exploration and analysis. Identify biases up front.
Exploratory data analysis and business process understanding are key to informing an effective machine learning or artificial intelligence project. We must perform due diligence in understanding how the dataset we are using was collected and apply a critical data-driven approach to vetting all the biases, level of balance, etc. implicit in the information used.
4. Question the basics. Always.
Break a complex problem down into a series of hypotheses that can be systematically tested and statistically proven out. Turn the ‘gut feeling’ or ‘intuition’ into a quantitatively answerable question. Is this feature really that informative in determining the optimal price? Why?
5. Use robust methodologies and approaches. Create reproducible results.
Similar to any scientific research or project, machine learning and artificial intelligence projects should be reproducible. Given the same dataset, environment, constraints, and other artifacts, the results should be replicable.
6. Practice orthogonality. Know what each component does.
Orthogonality in the computer science context refers to having one operation impact one change, without causing a widespread impact to other operations. In the context of machine learning, especially supervised techniques, it is important to control for unintended consequences by ensuring the tuning of one component without impacting multiple dimensions. There is increased risk in using a method that we know impacts two or three dimensions but may actually influence three or four others.
This is certainly not an exhaustive list, but rather a few things we can consider any time we go into an artificial intelligence or machine learning project to promote ethical implementations.
It is an exciting time for the field of artificial intelligence and machine learning. Through commercial expansion, the demand for this skillset and the disruptive potential of these cutting-edge techniques is only continuing to surface more.
However, with rapid growth in technology often comes a lag with respect to governance and best practices. As practitioners, we must continue to evolve and keep some basic principles, such as those listed here, at the forefront of our projects to ensure an ethical application of these techniques.