Mapping LCC to Wikipedia via LCSH

Matt Miller
3 min readNov 29, 2017

--

The intersection of LCC and LCSH is a messy space. Two insular classification systems. Both are equally complex. LCSH has its Form / General / Chronological /Geographic subdivisions. LCC has a complex hierarchy of inherited context. But both are ubiquitous in library records and are often the primary tools for resource discovery. I had two thoughts about these syetems:

  1. Given a specific LCC range are there dominate (topical) LCSH headings in its resources?
  2. If there are can we use the dominate LCSH to map to the Wikimedia ecosystem?

The goal would be to have a mapping which given a LCC range could return likely relevant articles/entities in the Wikimedia platform.

I’m using the LC Book, Serial, Music, Map and Visual Materials MARC record here again. I really like using it because for me has become a standard dataset. Results will change based on what data you use. But for these experiments all of them have the same underlying data source which brings a cohesiveness in my mind.

Process

  • Take each resource in a LCC range and extract its topical subject headings from 650$a. I’m not using subdivisions here.
  • Take the top 5 subject headings found in the resources in that LCC range and try to map them to a wikipedia article/entity.

I made an interface to explore the results:

https://thisismattmiller.github.io/lcc-lcsh-wiki-mapping/

you: whoa, what a dense obtuse interface!!

me, an intellectual: 😎

Interface explanation — you can also hover over the LCC range to get more info and click the LCC ID to search Library of Congress in a new window

I was thinking about doing a visualization of some sort, but sometimes a list is the best thing you can use.

I’ve tried mapping LCC to Wikipedia before, using manual curation. This is automatic and does break down, for example here are all the topical subject headings that it could not match. The more complex the heading (remember these are not using subdivisions) the less likely it was able to map. For example “Absenteeism (Labor)” or “Authors, Latin (Medieval and modern)” are simple ideas but are described somewhat cryptically to make the jump to a Wikipedia page.

For this work I used DBpedia. While Wikidata is deservedly the source for structured Wikimedia data, DBpedia is still invaluable for a project like this. One of the major problems is how do you connect two topics that are described differently but have the same semantic meaning. For example the concept behind LCSH “Liquor laws” is represented in Wikipedia, but under the title “Alcohol law.” You need a mapping that says “Liquor laws” == “Alcohol law.” This is exactly what Wikipedia page disambiguation is, when you visit https://en.wikipedia.org/wiki/Liquor_laws it redirects you to the Alcohol Laws page. DBpedia exposes 7.7 million of these such mappings. By leveraging these redirects we can dramatically increase successful mapping.

The results are interesting but not obviously actionable. I think as more and more links are drawn between traditional cataloging systems and the Wiki ecosystem use cases can emerge.

--

--