Caveat: In an effort to be open about the policy design process, I’m offering this personal perspective. There is no consensus that the approach described below is the way forward, and it may change 1000 times before its finalized. Policy making is sausage making after all.
Government does a lot. In fact, we do so much, that despite 11 years in the federal public service — four in a central agency — I still find myself continually surprised to discover how limited my perspective is of the breadth of government programming, even from my vantage in the ivory tower. You can see for yourself; the GC Infobase provides a visualization of this complexity broken down by spending, and one can get lost in the complexity very quickly.
As I wrote in my previous post, my most challenging task right now is to design the proposed Treasury Board Standard on Automated Decision Support Systems, which are rules around how federal institutions can automate some or all of their administrative decision processes. Government institutions make administrative decisions around a great many things, from providing the Canada Pension Plan Disability Benefit to licensing pilots to regulating horse-racing odds to issuing patents. Capturing all of that nuance in policy guidance is not going to be easy.
Enter the Algorithmic Impact Assessment
In the last month, two influential bodies — Nesta in the UK and the AI Now Institute in the US — have called for public agencies to use a tool called an algorithmic impact assessment (AIA, sorry for the acronym) during the design of an automated system to gauge its potential impact and design controls. There aren’t too many examples of how such an assessment should actually be designed, so I took a stab at it in the context of our proposed Treasury Board Standard on Automated Decision Support Systems.
When approaching the design of our AIA, I began with a principle that not all programs should be governed with the same degree of stringency. I want to provide federal institutions with enough leeway to innovate, while ensuring that the systems that make critical decisions about people are interpretable and able to be challenged. Ideally, such a ruleset could scale the governance of the automation to the potential impact said automation could have on society, especially if something went wrong. So the idea is to assign points to each potential impact an automated system could have.
We take a similar to an approach in our Standard on Identity and Credential Assurance, where we speak of four escalating categories of “identity risk” and “credential risk.” Collectively, these concepts speak to the potential impact to government of a loss of control of the identity or credential. It means that booking a campsite is not treated with the same security needs as renewing a passport, something that seems very sensible but took my colleagues a long time to get the details exactly right.
Understanding Automation Impact
Both Nesta and AI approach the AIA with a lens that focuses primarily on protecting the rights of individuals, but as I look at the tool in the broader context of everything government does, I need to expand the range of interests.
The Government of Canada operates on a daily basis considering and balancing many concerns. First and foremost is — usually — individuals, but concerns are far broader. Communities very are important: linguistic, ethnic, and geographic. The environment. The ability for individual businesses to succeed, but also the health and competitiveness of markets. Democratic institutions. Reconciliation with Indigenous peoples. Relations internationally or with other orders of government. A single policy rarely impacts just one of these constituent concerns; it may have rippling effects in many. The daily life of a policy analyst in government is to account for, and understand, understand this complexity.
As I originally conceived it, the test would request that the executive in charge of the program fills out the questionnaire below, drawing from expertise within their department as well as their legal counsel. Those of you that have seen earlier versions of our white paper Responsible AI in the Government of Canada will recognize it, as it was appended there until version 1.1.
The questionnaire is broken into two parts asking two separate questions, measuring the breadth and the depth of the system:
- Part A — What impact will my program have on various aspects of society or the planet?
- Part B — How much judgement will our automated system be delegated? Do humans choose the variables of the decision, or does the system? Is there a human in the loop?
To take the AIA, you score both parts, multiply Part A by Part B, and get a score. Each impact has an associated score, and the score is cumulative. So automating a small community microgrant would score much less than automating CPP Disability approvals, as an example. Expert systems, where all the variables to the decision are known — as well as their weighting — are inherently less risky than machine learning systems where this may not be the case. This was a built in way to prevent “black box” AI systems from being used for high-impact services.
The test sorts your system into one of four impact categories; as mentioned above, governance requirements can then be scaled to each of these categories:
- Low Concern — 0–9 pts
- Moderate Concern — 9–24 pts
- High Concern — 25–49 pts
- Very High Concern — Over 50 pts
Despite its apparently simplicity, the assessment is difficult to answer by design. It requires broad expertise to answer the questions, almost demanding that institutions tackle the complex problems of 2018 collaboratively. It means that institutions work as a whole, or with portfolio partners, to map the potential effects of their system.
The scoring system has three ranks based roughly on material effects and perceptions. The first, which I call “the stop sign,” is #1, a substantial impact on an individual’s liberty. An automated system informing even reasonable restraints on an individual’s Charter rights should be immediately subject to a high degree of scrutiny.
The second rank — #2–11, are predicted, substantial socioeconomic or environmental impacts. These are all weighed equally for two reasons:
- It’s impossible for non-partisan public servants to rank priorities over one another; this is more of a political exercise.
- Legislative realities make such a ranking impossible. For example, a program manager at Environment and Climate Change Canada is working under authorities that instruct them to worry about the environment as a higher priority than most others. Someone in Finance Canada might rank the test entirely differently.
Finally, there are the relation/perception modifiers. These add points if the system could seriously impact relationships with Indigenous peoples; provinces, territories and municipalities; and other countries.
I’ve been testing a variety of strictly hypothetical automated systems from the mundane (microgrants for Canada Day celebrations) to the dystopian (automated issuance of quarantine orders). You’re welcome to see my test score sheet here as it evolves, though I don’t have all of the explanations available. If you want to pitch in and help test out the tool, leave a comment below, DM me or message me on LinkedIn and I’ll grant you access.
The AIA as designed still needs a lot of work and testing, both with the questions and the scoring system. For example:
- The test biases heavily against systems that determine their own criteria to make a decision. Is this fair? Are humans necessarily better at determining criteria for making a decision?
- There is no reference to cultural or linguistic integrity of a community; should there be?
- Is the scoring system too strict? Not strict enough? Should we err on the side of caution for a couple of years and then evaluate?
Assigning a quantitative score for something unquantifiable is inherently subjective and muddied with bias, in this case my biases stemming from the fact that I am a highly privileged, white, urban male. This doesn’t stop the method from being wrong, it just means that it needs significant consultation and diverse perspectives so that this bias is hammered out as much as possible. They introduce some rough comparability where otherwise there is none, but in a democratic society, they have to be scored in a way that is reflective of a diverse set of priorities and worldviews. As Hillary Hartley said: “ Including diverse voices from a range of communities, geographies, and realities is critical to understanding the populations we serve.” Arbitrary scoring systems are used for procurement or hiring boards in a variety of sectors.
One assessment of many
“It takes many good deeds to build a good reputation, and only one bad one to lose it.” — Benjamin Franklin
If you’re sitting in government, I understand how all of this might seem like a bit of a pain. An AIA enters a complex architecture of documentation that surrounds deploying a system in government. A legal opinion, a Privacy Impact Assessment, a reference architecture, a cybersecurity assessment, are just some of the documentation that needs doing, and that certainly is difficult to do in a culture that is rapidly changing to just “build the thing.” Each of these have to be maintained as the program or system evolves and that’s a lot of bureaucratic overhead for a project. Some of the questions asked by the AIA will be answered by these other documents; in those cases, feel free to copy and paste content.
But at the same time it’s important to remember that these systems impact at scale, and as you can see in Ipsos Public Affairs’ 2017 report, aren’t highly trusted. Most implementations will not be innocuous. Striking a balance between helping institutions think through consequences whilst preventing unnecessary paper burden is never easy, so I’ll be sure to revisit the process based on feedback from the first several institutions undertake it from start to finish.
So how much analysis is enough? No idea. Unless I’m mistaken, AIAs are largely uncharted territory. Barring classified information, the AIAs should be available to researchers and the general public on our Open Government Portal. So it would need to stand up to scrutiny and comment by experts. That the AIAs are drafted in the design stage provides the opportunity for civil society to weigh in early. Over time, as tools like these proliferate across jurisdictions, I’m confident that a best practice will emerge. In the meantime we’ll need some departments to be process guinea pigs.
Technology is quickly becoming the clockworks of statecraft; how we build our systems will determine how we build our society of the future. New tools and approaches to governance are required to ensure that public sector actors reflect on the impact of what they do. I foresee AIAs taking shape as important tools of governance in the near term, so it’s important that we have a collective discussion on what they should look like. This is just the start of that discussion.