by Richard Light
This project aims to expose an existing gazetteer database — the one underpinning the Vision of Britain site — as a Linked Data resource conforming to the Pelagios Gazetteer Interchange Format (PGIF). Specifically, we want to publish information about historical administrative units in Great Britain: the Administrative Units Ontology (AUO).
Accessing the database
Portsmouth University IT services set me up with a virtual Windows machine to act as the host for this service. The subdomain data.pastplace.org was associated with this machine, which in turn was given read access to the machine and database holding the AUO. This database uses Postgres, with some geo-specific extensions. While the core data is stored in a ‘unit’ table, a number of subsidiary tables hold associated ancillary and repeating data (as one would expect).
Some of the data is held in project-specific formats, which required custom PG SQL statements and/or post-SQL-processing to generate the data formats mandated by PGIF. Specific examples are coordinates (which are stored as WGS84 strings) and date information (stored in a heavily-analysed custom field type).
PHP code design
In order to make the Linked Data delivery as simple and portable as possible, I have written code from scratch in PHP, with minimal reliance on external libraries. Since the RDF to be delivered may change over time (e.g. if/when PGIF is replaced by a new Linked Pasts place interchange format), I have adopted an approach where the RDF to be created, and its relationship to the source data, are expressed as a multi-level PHP array. The code traverses this array, fetches the data required for each RDF statement to be output, and where necessary converts it to the required format. The result is output to an abstract ‘graph array’, which is then serialized into the requested delivery format and returned as the response.
This approach will make it easier to change the RDF mapping and to add new delivery formats. (At present only RDF/XML and Turtle are supported.) I am also hopeful that this design might be useful for other projects where there is a need to deliver existing relational databases as Linked Data. The code needs considerable tidying up before it is fit to be seen by others, but my intention is ultimately to put it up on GitHub as a shared resource.
Linked Data delivery
In order to support the Linked Data conventions for content negotiation, I have implemented a number of URL rewrite rules in the web server (IIS). Specifically, they take account of the Accept header element in the HTTP request, and return a ‘301 See Other’ URL redirect where appropriate.
The RDF that is delivered begins with my best attempt to implement all relevant PGIF properties which are actually present in the source data. I have also created an ontology for expressing AUO-specific assertions, and use this to make additional assertions which I think will be useful for consumers of this data. For example, PGIF has a simple dcterms:isPartOf property which points to a containing unit, while the AUO expresses a number of types of relationships between the current unit and others (including ‘Contains’, as well as ‘Succeeded by’, etc.). The complete AUO ontology is here:
Searching the data
It is clearly beyond the scope of this project to develop a SPARQL end-point, so I have contented myself with providing a simple search facility. This takes URLs in the general form:
and returns an RDF response. This ‘API’ requires a knowledge of the fields in the units table and the format in which data is stored.
In addition, I would like to develop searches which return admin units which overlap a given unit; this will be of direct value to Free UK Genealogy’s ambition to search across the differing system of administrative unit in our three projects (FreeBMD, FreeReg and FreeCen).
Full dump of the data
One external user who has expressed an interest in using this Linked Data said that what they wanted was … all of it! This request reminded me that part of my proposal was to develop a facility to dump out the data, and was a timely encouragement to get on and do this. As a result, this format of URL:
https://data.pastplace.org/auo/auo_dump.php?offset=[start offset]&limit=[number to return]
will return the specified number of units (‘limit’) starting from the Nth one (‘offset’).
Mission creep (!)
In addition to the AUO units data, this project is home to the data generated by the GB1900 crowdsourcing transcription project. GB1900 provides a database of all the text found on the 1:25000 Ordnance Survey maps published around 1900. While much of this text consists of place names, it also includes generic human-made features such as wells and footpaths. Every text string is geo-located to the point on the map where the label starts. The Portsmouth team see this data as a valuable complement to the administrative unit data, and asked for it to be included in the Linked Data result. Here is a typical GB1900 entry (for ‘Leeds’):
Note that there are links in this entry to the AUO units within which this place [name] falls.
As mentioned above, I would like to implement a search for units overlapping a specified unit, along with any other types of search which Free UK Genealogy or other users would find useful.
A quick review of the AUO ontology suggests that I need to tidy it up here and there, and align the RDF that is delivered more closely to it.
On the serialization front, I have concentrated on getting the RDF/XML to work properly, and I am aware that the Turtle serialization isn’t yet correct. Also, I imagine that there will be some demand for a JSON-LD serialization of this resource, and will aim to produce one when requested to do so.