Library of Congress Holdings by LCC
A couple years ago I converted the LCC classification outlines in PDF format to JSON. The goal was to have a way to coarsely place a resource into a LCC category. With the Library of Congress data release I wanted to try it with their holdings to see the shape of the collection.
The LCC consists of 21 classes which narrows topics into increasingly specific categories. This means the system is hierarchical, I wanted to show that hierarchy and which parts of it held the most resources. I used the Book, Serial, Music, Map and Visual Materials MARC records. But not all records had LCC:
Total MARC records: 12,438,797
Total records w/ LCC: 11,870,343
Total records w/ no subfield 050: 568,454
Total records w/ LCC that fit LCC hierarchy: 10,528,234
(represented in this viz)
The records with 050 populated but not a valid LCC were things like minimal level cataloging records or shelf mark identifiers like “Microfilm”.
The visualization uses a force network to layout the hierarchy, but that is not to imply the use of network analysis, this is more like a cluster map or flowchart. It’s good at highlighting heavily collected areas and kind of a fun way to explore LCC.
2020 Update: I also made a visualization where you can browse by LCC in a interactive tree map: https://thisismattmiller.github.io/lcc-tree/