LCC is made of people

Matt Miller
3 min readDec 4, 2017

--

The LCC is a hierarchy to organize resources, but individuals are responsible for creating those resources. As we were able to find prominent subjects headings in each range of the LCC we should be able to find prominent people. What if we could see who were the major voices for an entire swath of a LCC classification. For example, a “who’s who” in the various branches of Philosophy. We can attempt this if we assume

  1. Someone who has authored many resources in the same LCC range is an authority
  2. OR someone who was written about a lot in a LCC range is someone important.

To do this we can aggregate records in a LCC range and pull out names found in their MARC 100 and 600 fields. If a name if found across multiple resources we can infer they are someone “important” to this LCC range.

I did this to the same LC Book, Serial, Music, Map and Visual Materials MARC records. This type of analysis is very subjective to the collection you are using, the results reflect the Library of Congress holdings.

I also wanted something more than their name. So I ran the names found through VIAF/Wikidata/DBpedia to enrich their data.

Process:

  • Aggregate all resources in a LCC range, take the top 10 names found in either 100 or 600 fields that occurred at least 3 times.
  • Run each name through VIAF to get their Wikidata ID
  • Download Gender (P21), Image (P18) Occupations (P106) and Ethnic Group (P17) from Wikidata.
  • Download “Subjects” from DBpedia

Name reconciliation is hard, if there was a Wikidata ID in VIAF I used it otherwise I did not spend any time trying to reconcile names.

31,000 Names total
4,600 Names were not found in VIAF
16,300 Names had Wikidata IDs

I wanted to gather as much information as possible because I had a hunch the results were going to look very male and very white. This of course reflects the materials/collection but I thought it would more interesting if it was possible to highlight LCC ranges with women in the top 10 for example.

Take a look https://thisismattmiller.github.io/lcc-people/

The interface: https://thisismattmiller.github.io/lcc-people/

You can hover over the headings and individuals to see stats. You can click the “View Classes With Women” to only render classes that contain women. If they have a Wikipedia page you can click through an read about them.

I was really only able to leverage the gender data from wikidata. From the 16,000 names that were on wikidata that was the most complete property:

16357 - P31-instance of
16282 - P21-sex or gender
16110 - P569-date of birth
15779 - P214-VIAF ID
14457 - P106-occupation
14320 - P244-Library of Congress authority ID
14275 - P27-country of citizenship
14148 - P735-given name
14003 - P213-ISNI
13459 - P269-SUDOC authorities
12749 - P19-place of birth
12504 - P227-GND ID
12408 - P570-date of death
12005 - P268-BnF ID
10333 - P2163-FAST ID
10014 - P1412-languages spoken, written or signed
9982 - P1006-National Thesaurus for Author Names ID
9231 - P20-place of death
8816 - P646-Freebase ID
8776 - P18-image
7221 - P3430-SNAC Ark ID
7197 - P373-Commons category
7076 - P69-educated at
6208 - P648-Open Library ID
5146 - P166-award received

But even if it was populated for most of the records there it is still about half of the names that were not on Wikidata, so no gender data was available for them.

I wanted to try to use Ethnic Group from Wikidata or Subjects from DBpedia to create more ways to slice into the listing but there was not a lot of data to work with.

With more data this can get really interesting, but for now it is a nice tool to discover individuals. And it is always nice to put a face to a name.

--

--