How the Embedding of AI Ethics Works in Practice & How It Can Be Improved

Adam Thierer
11 min readSep 22, 2022

--

In my previous essays on “Polycentric Governance in the Algorithmic Age” and “AI Governance ‘on the Ground’ vs ‘on the Books,’” I explained how collaborative “soft law” efforts can go a long way toward improving accountability and responsibility among various emerging technology companies and individual innovators. Standards, codes, ethical guidelines, and multistakeholder collaborations create powerful social norms and expectations that are often equally or even more important than what laws and regulations might seek to accomplish.

There are powerful reputational factors at work in every sector that — when combined with efforts such as these — create a baseline of accepted practice. These efforts are also likely to get more initial buy-in among private innovators, at least compared to heavy-handed regulatory proposals. Finally, these efforts deserve more attention if for no other reason than the continuing reality of the pacing problem, with technology increasingly evolving faster than the ability of traditional hard law to keep up. Soft law mechanisms, by contrast, will always be easier to adopt and adapt as new circumstances demand.

The Two Key Goals of AI Governance

But for codes of conduct, voluntary standards, and professional ethical codes to have lasting impact, additional steps will likely be needed. “It is not enough to just have AI companies sign onto a list of ethical principles,” argue Gary Marchant, Lucille Tournas, and Carlos Ignacio Gutierrez. “Rather, these principles must be operationalized into effective practices and credible assurances.” This need for “transitioning from ideas to action” represents the major challenge for soft law and decentralized governance efforts going forward.

The first phase of AI soft law development has been aspirational and focused on the formulation of values and best practices by soft law scholars, government officials, industry professionals, and various other stakeholder groups. Currently, and in years to come, the focus will increasingly shift to the implementation and enforcement of these values and best practices. The ultimate success of soft law mechanisms as a governance tool for AI will come down to how well aspirational goals get translated into concrete development practices. To reiterate, this involves the twin goals of:

(1) “baking in” or aligning AI design with widely-shared goals and values; and,

(2) keeping humans “in the loop” at critical stages of this process to ensure that they can continue to guide and occasionally realign those values and best practices as needed.

Transfer Learning for Ethical Principles

Here’s another way to conceptualize of this process. AI experts increasingly talk about the importance of transfer learning when thinking about how to improve machine learning techniques and develop more sophisticated AI systems. Transfer learning refers to “the improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned.” Through transfer learning techniques, algorithms are trained to reference and learn from related data sets and processes to achieve superior outcomes in a different domain. Human programmers oversee the process and constantly look to refine and improve those systems using this sort of transfer learning techniques.

This is a useful way to the think about how to embed and align ethics, too. We essentially need the equivalent of transfer learning for ethical principles within AI systems as they evolve such that important values and principles are constantly embedded at each step of the process. Optimally, as algorithms and AI systems learn and develop new capabilities, the goal should be to ensure that the same guiding principles we have attempted to “bake in” remain and are extended. If AI systems can gain greater capacity to transfer and use knowledge it has learned from one task or application to another, by extension, it should be able to transfer and apply ethical principles and guidelines it has learned from one task or application to another.

Of course, human operators still need to be “in the loop” to correct for inevitable errors along the way. This does not mean the process is foolproof because not only will machines make errors, humans will as well. Moreover, as already noted, sometimes important values and best practices will be in tension with others and need to be balanced in ways that some parties won’t like. Nonetheless, the general framework of trained learning for AI ethics remains valuable.

Iterative Amplification for Aligning Ethics

Iterative amplification could be another way of thinking about how to gradually build safer AI systems over time. Paul Christiano, who runs the Alignment Research Center, a non-profit research organization whose mission is to align future machine learning systems with human interests, frame iterative amplification as follows:

The idea in iterative amplification is to start from a weak AI. At the beginning of training you can use a human. A human is smarter than your AI, so they can train the system. As the AI acquires capabilities that are comparable to those of a human. Then the human can use the AI that they’re currently training as an assistant, to help them act as a more competent overseer.

Over the course of training, you have this AI that’s getting more and more competent, the human at every point in time uses several copies of the current AI as assistants, to help them make smarter decisions. And the hope is that that process both preserves alignment and allows this overseer to always be smarter than the AI they’re trying to train.

The hope here, Christiano notes, is that, “as you move along the training, by the end of training, the human’s role becomes kind of minimal” and “at each step it remains aligned. You put together a few copies of the AI to act as an overseer for itself.” When we think about iterative amplification as governance strategy, the general goal is the same the one we’ve repeatedly stressed: Baking important values into AI development and keeping humans in the loop along the way to refine and improve the alignment process until it becomes safer and more useful.

Taken together, transfer learning and iterative amplification are essentially forms of learning by doing. As I’ve noted previously, it is a mistake to think of AI safety or algorithmic ethics as a static phenomenon that has an end point or single solution. Incessant and unexpected change is the new normal. That means many different strategies and much ongoing experimentation will be needed to address the challenges we confront today and the many others to come. The goal is to continuously assess and prioritize risks and then formulate and reformulate our toolkit of possible responses to those risks using the most practical and effective solutions available.

The Importance of Red Teaming & Ongoing Learning

Red teaming is an example of one strategy that AI firms already use to address to accomplish this. Red teaming involves testing algorithmic systems in a closed or highly controlled settings to determine how things could go wrong. Anthropic is an AI safety and research company that has done important red teaming research and their researchers have documented how “using manual or automated methods to adversarially probe a language model for harmful outputs, and then updating the model to avoid such outputs” is a useful tool for addressing potential harms.

By intentionally eliciting problematic results from natural language processing models, and then taking steps to counter those results, red teaming represents the idea of ethical transfer learning and iterative amplification in action. However, Anthropic researchers correctly note that “[t]he research community lacks shared norms and best practices for how to release findings from red teaming,” and that “it would be better to have a neutral forum in which to discuss these issues.”

Luckily, there are many useful soft law mechanisms — some old, some new — that can address that problem and facilitate collaborative efforts. As I documented in earlier essays, many broad-based ethical guidelines already exist for AI development, and they are increasingly organized around a common set of values and best practices such as transparency, privacy, security, and non-discrimination. Again, professional associations like IEEE, ACM, ISO and others are particularly important coordinators in this regard. Industry trade associations and other NGOs also play a crucial role. The organizations and bodies need to work together to, in essence, align alignment efforts. That should include finding ways to better publicize red team research methods and results while identifying useful collective solutions to other common vulnerabilities that are identified.

The Need for Better Coordination

The next step is ensuring that such values get translated into concrete guidelines and guardrails at the developer level. Marchant, Tournas, and Gutierrez highlight the growth of important internal measures that can help AI developers get serious about embedding ethics by design and ensuring that humans are kept in the loop along the way. In addition to the work done by professional bodies and trade associations, they identify many other important strategies to give shared norms and best practices real meaning:

  • Corporate boards: Building on widespread corporate social responsibility themes and efforts, corporate boards can act to align business practices and decision-making by encouraging firms to adopt widely-held values or guidelines. These efforts by boards can help ensure that the firms guard against misuses of their technologies, which could have negative reputational effects and financial ramifications for the company and its shareholders.
  • Ethics committees: Firms can establish and empower internal bodies or technology review boards to help embed and enforce ethics by design. Microsoft established an Office of Responsible AI to help establish and enforce “company-wide rules for responsible AI through the implementation of our governance and public policy work.” Microsoft has also developed a robust harms modelling framework to build on the ethical best practices they developed. This framework includes what they refer to as “community juries” process to bring together communities affected by various their technologies. Likewise, IBM created an internal AI Ethics Board that built on its pre-existing Privacy Advisory Committee to consider how to educate employees about embedding ethics when designing new services.
  • Ethics officers: Another type of internal champion is a Chief Ethical Officer (or similarly-titled ethical champion), who plays a role similar that played by Chief Privacy Officers at many firms today. These professionals have a formal responsibility to help establish best practices for technological developments and then ensure that organizations live up to the commitments they have made.
  • Ombudsmen or whistleblower mechanism: AI developers can enlist the support of internal and external individuals and experts to help monitor these efforts and evaluate ethical development and use on an ongoing basis. Some firms have already formed external ethics boards or watchdog bodies, but sometimes not without controversy. A notable effort by Google to form an Advanced Technology External Advisory Council in 2019 shut down less than a week after its launch due to protests about certain members of the Council. Meanwhile, in mid-2022, Axon, a firm involved in law enforcement contracting, announced a plan to move forward with an effort to develop Taser-equipped drones to address mass shootings and school shootings, even though an AI Ethics Board it had formed recommend against it. In response, nine members of that body resigned in protest over the company’s decision to ignore their advice. But then Axon announced it was halting development of the taser drones in response to the resignations. Other firms have developed similar external ethics boards. Whistleblowers have made news in recent years for outing algorithmic practices at Facebook and Twitter, among other tech companies. It is likely that will continue and likely influence the creation of more internal and external oversight mechanisms to avoid liability or just past PR.

The good news is that many developers are getting more serious about embedding ethics in the AI design process using such approaches. As Vox reporter Kelsey Piper summarizes, “we can build AI systems that are aligned with human values, or at least that humans can safely work with. That is ultimately what almost every organization with an artificial general intelligence division is trying to do.”

[Note: This essay and several of the others referenced below are derived from a book I am finishing on the future of AI governance.]

Additional Reading:

--

--

Adam Thierer

Analyst covering the intersection of emerging tech & public policy. Specializes in innovation & tech governance. https://www.rstreet.org/people/adam-thierer