AI systems have entered the workforce, and they’ve transformed the way businesses operate. They forecast future production demands, optimize delivery routes, detect defects across operating systems, and even assess the technical capabilities of new employees in interviews. They write emails, provide on-demand customer support, and assist with data collection and analysis. By taking over these time-consuming, repetitive tasks, AI systems have freed workers from the mundane and repetitive elements of their work, allowing them to focus on duties that require more creativity and complex problem-solving and deliver more value to their companies.
However, the complexity of current AI technologies makes it difficult, if not impossible, for businesses and consumers to assess the trustworthiness of these systems and understand how decisions are made. A number of real-world use cases have been waylaid by this lack of trust and understanding.
At this early stage in the development of AI, when trust is low, there is a high burden to explain why an AI system makes the decisions that it does and how these conclusions are reached, especially when discussing the automation of high-value job functions and the work of skilled professionals, like doctors and lawyers. However, there is no truly robust and widely accepted way to detail the decision-making behind many “black box” AI systems. As a result, we need standards bodies to establish clear, defined, and trusted standards for AI and to provide AI systems with credentials, much like a lawyer’s bar exam. These standards and credentials will help codify what AI systems may be used for and in what capacities.
By making reliable AI systems that adhere to clear standards that are verified by a third party, or a regulator in regulated industries, workers and businesses will be able to begin to trust the decisions these systems are starting to provide.
A Bar Exam for AI
Professional credentials for vital job functions are, of course, not a new idea. They go back to at least the Middle Ages, when most people understood that a “Journeyman” had achieved a specific level of skill as recognized by a Guild of Masters. As a result of this attestation of competency (which was given, in this case, by the Masters) a degree of confidence was given to the Journeyman’s work.
Today, most skilled professions have credentials that are universally trusted. Crucially, the meaning of these certifications are all fairly easy for the average individual to assess and understand i.e., although Jane Doe may not understand the biochemistry necessary to become a medical doctor, she will trust that someone who has the title “MD” after their name will have the level of skill and training needed to be able to diagnose maladies and recommend appropriate treatments.
Consequently, creating standards and credentials for AI systems that are universally recognized — something akin to the physicians’ board examination or lawyers’ bar exam — is something that presents itself as a viable way forward.
What’s more, these certifications may help AI systems meet necessary legal requirements, such as the “right to explanation” outlined in the recent GDPR, which grants people a right to information about (an explanation of) the algorithms they interact with. This would be a significant benefit to the individuals and organizations that are deploying AI technologies, as laws governing complex decision-making by algorithms are becoming more common. There is an emerging regulatory and policy landscape surrounding AI, and countries are increasingly developing national strategies and standards for AI systems. It is increasingly likely that comprehensive standards will make their way around the world, and such certifications would prepare AI systems to satisfactorily meet these standards.
Standards and Certification
Although this will be a large task, it is not without precedent. For example, there are the ISO 27001 and SOC 2 standards for information security management. These are measures that have highly complex technical standards that organizations must comply with, and they extend across a range of domains — encompassing things like data segregation, firewall protections, password protection measures, and mobile phone security. Such frameworks prove that it would be possible to create and enforce similar certifications for AI systems.
In the United States, the International Society of Automation is a preeminent standards organization that provides standards for measurement, instruments, and automation. For the purposes of AI automation, an ISA standard might be extremely opportune.
Similarly, the FDA is the standards and regulatory body for drugs, diagnostics, and other treatments in the US, and would be a fitting organization to establish standards and certifications for AI systems operating in the space. As a scientific organization, their work involves the same types of metrics that AI researchers use in their work for quality control, to validate their models, and to gain approval from the scientific community.
In their machine learning guidelines, the FDA has already taken strides towards establishing standards. They adopt the Good Manufacturing Practices (GMP) terminology and coin the new term Good Machine Learning Practices (GMLP), which outlines data, training, and model development standards. The specific mathematical metrics that researchers use to validate their own models include precision and recall and the F1 and Fβ scores, which combine both precision and recall into a single score. It would be possible for regulators to use many of these same metrics, as they are the scientific standards. As an area for improvement, developing metrics for the internal workings of an AI system, for the biases in the data they are trained on, or metrics on derived models where known AI systems are chained together may provide needed model transparency.
The difficulties regulatory agencies will face in vetting AI technology include the development of datasets for new disease areas and new patient populations. Often, this burden falls on the shoulders of the pharmaceutical companies and other industry sponsors, who must submit their studies to the FDA. Where in the past these included Phase I-III trials, the next phases of AIs’ penetration into biomedicine will involve AI performance vs. previous performance standards, much as in the FDA 510K device approval process.
Keeping Trust Alive and Well
The field of AI has been developing for the last 25 years, and it has evolved in fascinating ways, and it will continue to do so. As a result, establishing trust in AI, and securing the appropriate regulatory frameworks to secure this trust, will require flexibility. Processes and standards will need to change and adapt as cultural values and technology continue to evolve. Yet, we have finally arrived at a place where the implementation of AI in different industries requires standards and oversight. Passing an AI bar exam should be seen as the next step in the development of the field of AI.
To be clear, once such certifications have been established, they should not confer the rights and privileges of certifications for skilled professionals. AI test-taking — where an AI learns to reason logically about questions in both pictures and text — has reached a point where AIs regularly outperform the most skilled human learners. Such tests should not be used to vet AIs as doctors or lawyers where a high degree of human experience is required in delicate medical or legal scenarios. However, a certified AI may be able to support sensitive medical decisions by mining vast genetic or pathology databases to find the one drug that worked for a rare disease variant. In this respect, certified AI systems may provide skilled professionals with much-needed assistance with the mundane aspects of their jobs and allow individuals to trust the work performed by these systems.
The development of such a holistic certification process, and the work needed to ensure that our AI systems can actually pass such tests, will require advancements in conceptual learning, computer vision, and a whole host of varied domain-specific knowledge to guarantee consistency in product output. But, once this has been accomplished, certifications will enable a host of down-stream developments, from automated production pipelines to automatic drug discovery.
Frankly, the burden of validating AI technologies will likely end up being shouldered by industry participants, as the regulators are always one step behind. As researchers and technologists in academia and industry continue to invent and solve problems, the ways artificial intelligences are used will progress and evolve. As one established technique becomes trusted, a hundred more speculative techniques must be evaluated through the joint work of standards committees, workers in the field, and the consumers and clients who use them. At Augustus Intelligence we leverage our vertical integration, experience building Artificial Specific Intelligences, and state of the art explicable AI to provide value to our customers and clients across the board. Interested in learning more?
Frey, C.B., and Osborne, M.A. (2017). The future of employment: How susceptible are jobs to computerisation? Technological Forecasting and Social Change 114, 254–280.
Garcez, A. d’Avila, Gori, M., Lamb, L.C., Serafini, L., Spranger, M., and Tran, S.N. (2019). Neural-Symbolic Computing: An Effective Methodology for Principled Integration of Machine Learning and Reasoning. ArXiv:1905.06088 [Cs].
P2418.6 — Standard for the Framework of Distributed Ledger Technology (DLT) Use in Healthcare and the Life and Social Sciences https://standards.ieee.org/project/2418_6.html.
Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) https://www.fda.gov/media/122535/download.