What is a Taxonomist, and Why Do Digital Orgs Need Them?
Building the right taxonomy is foundational to unlocking so much potential
By Andrea Mateka, Semantics and Metadata Coordinator & Taxonomist at CBC
“I’m not a taxidermist”…
…is the reply that usually follows when I introduce myself.
This is a great example of how language paints a picture. Based on your personal experience up until this point, when I say “taxonomist” — a profession you may have never heard of — your mind may go to the closest thing you can think of, such as the art of stuffing dead animals. Language is fun that way.
A taxonomy is a knowledge organization system. Traditionally, taxonomies have been concerned with the classification of biology — it’s the scientific study of naming, arranging, and classifying organisms based on shared characteristics. What we do as taxonomists on the Semantics and Metadata team at the Canadian Broadcasting Corporation (CBC) is not all that different. Instead of flora and fauna, we classify content (stories, images, videos, etc.) and build contextual relationships between it.
Instead of flora and fauna, we classify content … and build contextual relationships between it.
In our infancy, we started with the intention of supporting our digital text Content Management System (CMS) team after they had identified three key problem areas: the need for better content descriptions to improve workflows; better association of content to audience segments; and improved performance metrics for online content. And so our CBC taxonomy was born. We built the foundation of our taxonomy by leveraging news-centric vocabularies from external, well-respected sources and other linked data before iterating on the product based on our own business needs. As the CBC mandate is to deliver Canadian content to Canadians, this meant constructing solid Canadian language and representation in our taxonomy concepts.
But our work doesn’t end there. Taxonomy and metadata management as a whole has become a brand new and emerging source of opportunity for organizations like ours. The world of digital content faces a challenge — with so much being created, at speed, how can you manage it all in a way that allows it to remain discoverable, distributive and performative? Taxonomy-based strategies are materializing all across the tech sector. Whether it is viewing recommendations on Netflix or guiding your grocery shopping experience with Instacart, taxonomies (and their best friend, ontologies) are becoming the driving force behind our digital experience.
This is no more true than it is for the CBC. We produce audio, video, photography and text-based content. And within those categories you can find news, entertainment, live, pre-recorded, podcasts and sports. That’s not even mentioning all the geographic regions that we cover and that we do so in English, French and nine different Indigenous languages. What started as a project to meet the needs of our CMS team has evolved. As we look to the future, we’ve set our sights on being the foundation that the CBC needs to magnify the value that we already create as part of our programming, delivery and product experiences. The more our concepts are applied, the more fitness we as a corporation have in automating content line-ups, building audience segments, and augmenting a personalized, single sign-on CBC experience.
[W]e are experimenting with machine learning and pattern recognition to make the lives of our content creators a little bit easier.
But the taxonomy team is not audience-facing itself. Similar to a software platform team, we provide the building blocks that content and product teams use to ship business-facing functionalities. In order for our team and others to be effective, tagging compliance is required. But how do you encourage busy news reporters to add a few tags?
Currently, we are experimenting with machine learning and pattern recognition to make the lives of our content creators a little bit easier. For example, we have a process we call “Click & Review” that automatically applies tags to text content through the use of an extraction tool. After a user initiates the extraction tool — or “clicks” — they are then required to “review” the auto-suggested concepts.
As part of this review process, we are able to capture information on what was or wasn’t successful about our taxonomy. Our team will monitor user actions (or inactions) and find areas for improvement — such as adjusting the extraction of a concept that is commonly misapplied (“hope” the feeling vs “Hope” the city in British Columbia), react to new subjects entering the lexicon (“Coronavirus” or “Monkeypox”), or make more inclusive choices as to how we use language (decolonizing locations by restoring Indigenous place names).
As (past) librarians, we’ve always utilized systems of classification, thesauri and controlled vocabularies. Because of this, we’ve become acutely aware of the power these language tools have over content discovery and interpretation through the application of a few simple words. As custodians of the CBC taxonomy, we engage in a continuous process that is organic and matures with every new metadata-driven opportunity. Because language is contextual. So next time you are at a party and someone tells you that they are a taxonomist, you’ll be picturing this article. And not a dead bobcat with weird-looking eyes.