Navigating the Ethics of Ontology in AI Development

Kate Mayhew
SEEK blog
Published in
8 min readSep 21, 2023

--

In this post, we discuss the responsible use of ontologies in AI through the experience of Kate Mayhew, Principal Ontologist at SEEK. We explore the known challenges of creating and owning ever-changing ontology data assets and the considerations that we apply to our everyday work.

By joining meetups about Ethics in AI, participating in an Ethics in AI reading group, and eventually participating in the creation and implementation of our own internal Responsible AI practices at SEEK it has been highlighted exactly how valuable continuing conversations and hearing from others in different domains and from different backgrounds is. In this post we hope to open the dialogue further, focusing on our learnings about how using ontologies can and should be a part of the broader Responsible AI conversation

SEEKs Responsible AI principles help to guide our decision-making when building solutions for our users and provide a framework for our assessment of risk

Using AI responsibly is a major conversation, but often focused on algorithms and not other parts of the technology

AI is the topic of the season. With the releases of new tools like ChatGPT, Midjourney and others that are easily accessible, many are talking about its new uses and ways to make life easier with the new tools. Simultaneously, many express fear of AI, its potential to take their jobs or change how people interact.

As a result of this conversation, using AI responsibly is a growing global focus across social expectations, law, and the practicality of applying new and constantly changing expectations into technical solutions. While the focus is primarily on the impacts due to algorithms and the lack of explainability they bring, there is less public discourse about the underlying data assets that drive outcomes of some automated approaches and how they can play their part.

At SEEK, the domain that we work in is people’s careers. We strive to help individuals find meaningful work, and businesses succeed through the employment of the best candidate for their role. As work provides such a critical aspect of people’s access to income, and often is understood as a major part of people’s sense of identity and self-worth, we consider that working in this field has a profound impact and using AI must be done responsibly.

Our purpose is to help people live more fulfilling and productive working lives and help organisations succeed. It is central to everything we do and guides the way we work together and what we deliver. Photo by Priscilla Du Preez on Unsplash

Structuring natural language can reduce bias found in your raw data sources

If you read our last blog post, you will have seen some scenarios where you would likely employ ontologies. These use cases are often about improving the harmonisation of name variance or providing context about what a thing is so that you can give better results than if you only had the raw data.

There are many benefits from this structure that can support the responsible use of data in AI systems and underpin responsible AI operationalisation:

  • Improving name harmonisation in your raw text can also reduce variation often associated with protected attributes. This can include resolving common misspellings linked with people with learning disabilities such as dyslexia; resolving grammatical issues linked with those speaking a secondary language; or defining equivalency between community-specific spellings such as localised spelling of a name.
  • Identifying the type of data can help to guide consumers in how to consume it and its potential risk. For example, knowing that a label represents a protected attribute such as “parent” can change what use cases you are willing to use that data in, or what types of algorithms you are willing to use on that data.
  • Enrichment with the same ID enables recall of raw text that uses different language to describe the same thing. In understanding that two strings represent the same thing, we can create equal performance or improve the visibility of strings less commonly used by those searching where they are still relevant. This can be particularly important in communities with more than one primary language.
  • Relationships between concepts enable you to handle various levels of granularity in your data. If a job is looking for someone with AI experience, and a job seeker applied stating that they have experience with NLP and ML, we want to be able to highlight that they meet this requirement of the job equally to another job seeker who has chosen to express their experience at the abstracted level that matches the job text.
  • Pairing ontologies with statistical approaches such as LLMs can create scale while minimising quality issues. It’s well known that LLMs have challenges regarding hallucination, and it’s also costly to retrain them in order to keep up to date with changes in the world. A curated relationship or attribute in an ontology can be used to provide a higher rating/level of trust and provide a more confident output. Similarly, those that do not have this curated relationship can be shown with some kind of messaging to convey their level of confidence.
Knowing that a “chippie” is a common name for a carpenter in Australia can help you resolve these two things in the same way and ensure that they get the same treatment. Photo by Alexander Andrews on Unsplash

But creating knowledge representation can have many fuzzy edges and your decisions can create bias

Despite our best intentions, we know everyone has internal biases. We also know that sometimes the language used to describe entities and the perception of what an entity is can vary based on who you are and what your experiences have been. Some potential sources of harm to consider include:

  • How the data you produce will be used can change scope and design decisions. The intent of the assets and the context used in decision-making to build the asset is an important factor in the impact it will have when used in production.
  • Handling protected attributes can require thorough consideration. In some cases, it can be simple enough to exclude claims regarding protected attributes, however, this requires careful consideration. Does the exclusion mean that those who have used this language will be completely excluded from the product you are supporting?
  • The structure and design of your ontology can misrepresent the original intent of the raw data. Depending on the method used to enrich raw data with the ontology, and the UX provided to your end users the outcome for some labels can vary greatly from their original intent. Imagine making a claim about Italian in a field called “languages” compared to a dot point in a CV talking about chef experience with various cuisines.
  • Scope choices of your asset can have unintended impacts. If we were to define organisations as what is registered on the Australian Business register, we would likely find that if the asset was used on our job seeker profiles we would face issues of low coverage, as we know job seekers can have worked anywhere in the world. Depending on how the enrichment is used in the final product, this could mean that we unfairly favour people with historical AU experience or block people from making accurate claims about their work experience.
  • We all have internal bias which can influence our decisions on how to represent the intent of language. Our perception is greatly coloured by our own experiences. For example, many people who work in the tech sector are surprised that AI can mean something very different to Artificial Intelligence, especially in the context of the agricultural industry. The risk of this happening increases with the number of countries your data is used in, as concept boundaries might be different from one market to another.
  • Most data sources will have a long tail problem of infrequent expressions mixed in amongst noise, and prioritising can mean creating an outgroup. The trade-off here is often about quality vs scale, and being conscious of these decisions is a key enabler for your consumers to be aware of risks in their use case.
  • How people want to represent themselves is expressed in the language that they choose to use. Finally, and possibly most importantly, any normalisation can take away from intentional choices made by end users. This can be a benefit in that it can remove some potential bias, however, the way that your asset is enriched against raw data can make or break an experience.
ANZSCO is a standard taxonomy designed for the Australian Bureau of Statistics describing role titles into abstracted classifications in order to provide a comparable representation of the entire market for statistical purposes. If we were to use the taxonomy as is for other use cases it might give some unexpected results such as grouping Geographers with Linguists and Parole Board officers for job recommendations.

The way to reduce the likelihood of negative impact through the use of ontology is to be explicit in your intentions, and thoughtful in your approach to its creation

It might feel at this point that implementing an ontology could be a pretty difficult thing to achieve, but there are some straightforward, practical steps that you can do to ensure your assets are minimising as much risk as possible. Some suggestions that you might consider are:

  • Build awareness and empathy for the users you are supporting. Understanding what experience the end users have used to provide the data, what challenges or limitations they face, and what their intent was is a key factor in being able to represent them fairly. A common practice to help build this experience is user research or surveys. Awareness can also come through other channels like joining meet-ups with the community or your own lived experience.
  • Ensuring that your schema and intent are explicitly documented. This will help consumers to understand decisions made within the asset and potential limitations that it might have for their use case. Examples of this approach have been discussed widely by researchers such as Margaret Mitchell and Timnit Gebru, and companies such as Google
  • Using data and evidence to support your decisions. Every point in your model is a design decision. Spending time to understand what is evidenced in your raw data will ensure that you are able to design for any quirks or considerations specific to your needs. There is no one size fits all approach.
  • Prioritise representation of folks from varied backgrounds in the development of your assets. Having a variety of experiences can help to challenge our own internal paradigms, and capture more in testing. This might mean having a diverse ontology team, or it might be more focused on consultation with external parties if team scale is not an option.
  • Create guidance for team members and consumers about sensitive decisions. If multiple people are working on building your ontology, it’s important to have consistency in their approach to decision-making. It’s also important that you are able to communicate what approach you have taken to your consumers.
  • It’s all in how you use the data asset — work with those who develop the user experience and enrichment methodologies. Ontologies can be used for enrichment without overriding raw input, and understanding how your asset may or may not be shown to the end consumers can influence whether it can be a responsible use or not.
Google data cards are a great example of making the intent and considerations of your assets visible to consumers.

Here at SEEK, this is an ongoing conversation and I’m keen to hear about your own experiences. It’s evident that representing human behaviour and language is inherently complex, and we must be careful with our decisions. Ensuring that we proactively think about the impact of what we create by being empathetic, inclusive, and ensuring knowledge of the domains that we aim to represent is the responsibility of all knowledge representation creators.

--

--

Kate Mayhew
SEEK blog

Thinking deeply about representation and how it relates to data and AI solutions in the HR domain