This week I joined a cohort of six summer Fellows at the Harvard Library Innovation Lab based out of the Harvard Law Library. This is the second year of the fellowship program and we will be spending the next ten weeks working on a diverse set of projects set in the library and law domains. My work will be piggybacking on an ongoing LIL project the Caselaw Access Project. Harvard Law Library has one of the oldest and most complete collection of U.S. case law in the country and have been digitizing it as part of a large collaborative project in recent years.
The digital output from this process is OCRed text that is then formatted and marked-up by a vendor into a structured XML document. These files delineate all the key elements of the court opinion, even connecting footnotes to related sentences. These files are ready to to be ingested, indexed and made accessible digitally.
These documents are now infinitely more organized than free text by defining the structural component of the documents. A further step would be organizing the content of the documents, applying authority control for names and places, introducing persistent identifiers, applying classification, etc. These knowledge organization activities are familiar to librarians and what I have personally been working on in the cultural heritage and library domain. They become especially powerful in the context of linked data, applying community authorities such a Wikidata to resources, linking to other institutions, treating the web as a metadata systems and interweaving everything together to facilitate access and discovery.
These activities seems particularly compelling for this dataset. Facilitating access to the materials is a priority for the project (and there are groups who have been doing this) but for the legal layperson (such as myself) if we made these case laws easy to access and cite by weaving them into large public knowledge repositories like Wikipedia/Wikidata could they become a potential tool of civic engagement? To possibly facilitate that idea in the future my work this summer will be looking at how to interlink the case law dataset with external data sources.