(Re)building Charles Booth’s London
Late last year, LSE Library launched Charles Booth’s London, a new website that provides access to digitised content from Charles Booth’s nineteenth-century study of poverty in London. The site replaced two legacy services, the Charles Booth Online Archive and PhoneBooth, dating from 2001 and 2012 respectively, and its launch marked 100 years since Booth’s death. This post provides an overview of the technical research, decision-making and development that went into the creation of the new site.
Taking apart the existing applications
Our goal for Charles Booth’s London was to present the content from the two existing Booth sites in a new application, featuring a contemporary design and backed by modern components. Our first step was to extract that content.
The Charles Booth Online Archive provided access to digitised versions of notebooks from Booth’s poverty survey, a searchable collection of notebook entries (originally extracted from a page-by-page archival catalogue of the notebooks), and an early digitised version of Booth’s poverty map, presented alongside a modern street map. A database contained coordinates that associated notebook entries with locations on the map. The original scanned images of the notebook pages had been stored as TIFFs on optical media, and we were pleased (and relieved) to discover that the files had all survived and were of sufficient quality to be reused 15 years later. The notebook entries were stored as structured text files (indexed using Zebra), and the database was Postgres. Both easy enough to extract and work with.
For the map, we turned to PhoneBooth, an experimental mobile web application funded by Jisc and developed in collaboration with EDINA. As part of the work to develop PhoneBooth, the digitised sheets that make up Booth’s poverty map had been georeferenced, georectified and stitched together by EDINA. The end product was a complete version of the map that could be overlaid on a modern map and presented in an interactive Google Maps-like interface. We wanted to build on this innovative work in our new application, so we would need to get to grips with the technology stack behind it.
Finding our way with the Booth map
Lacking any GIS experience of our own, we set about familiarising ourselves with the mapping technology behind PhoneBooth (a grateful shout-out to Oliver O’Brien at UCL here, who gave us a crash course in the basics of web mapping). At the bottom of the stack was MapServer, an open-source application that generates a dynamic tiled map from map images and data. Above this was TileCache, used, obviously enough, to cache map tiles to improve performance. The map itself was presented using OpenLayers, a JavaScript library for rendering dynamic maps on the web.
As a first step, we decided to replicate the PhoneBooth map stack using up-to-date components. MapServer is still actively developed, and was easily installed from the Ubuntu repositories. We replaced TileCache, which appears to no longer be supported, with MapCache, a sister project of MapServer. OpenLayers had been completely rewritten in the time since it had been used for PhoneBooth, so we rewrote the front-end map component from scratch using the latest version of the library (we also experimented with Leaflet, but concluded that OpenLayers was more flexible).
Once we had the map working, we were able to consider possible enhancements for our new application. Whilst rebuilding the full map stack was an essential first step, we decided to simplify its implementation in production. We used MapCache to seed a complete set of map tiles, which we serve as static assets. We’ll maintain the full stack in house, in case we need to make any changes to the map itself.
We made visible changes too. PhoneBooth had overlaid a variably transparent Booth map on a modern map of London, with the option to switch the base map between Google Maps and OpenStreetMap. Google map layers are not supported by the current version of OpenLayers, so we quickly discounted this option for our base map. There are a number of other commercial mapping providers that provide map tile APIs, but most impose restrictions on the use of their data, and we were committed to using open services where possible. That left us with OpenStreetMap, but we weren’t sure that its standard map tiles were a good match visually for the Booth map. In the end, we created our own set of map tiles, based on OSM Bright, using OpenStreetMap data, CartoCSS (CSS for maps, basically) and TileMill.
PhoneBooth had also enabled users to view notebook entries as points on the Booth map. To replicate this feature, we turned our attention to the data we had extracted from the Charles Booth Online Archive, which had itself been reused in PhoneBooth.
Data matures like wine, applications like fish
Or not, in our case. Whilst we planned to decommission the two existing Booth applications, we hoped that the data underpinning them would provide the cornerstone for our new site. We soon discovered, however, that there was something fishy about that data.
Specifically, we found that the map coordinates attached to notebook entries were not reliably accurate. A set of coordinates attached to a notebook entry for Church Street in Stoke Newington, for example, might actually resolve to Church Street in Newham. We haven’t been able to find out how the coordinates data was originally created, but we suspect that street names were extracted from the notebook entries and matched indiscriminately with a modern gazetteer. In fairness to our predecessors, the Charles Booth Online Archive did only present these coordinates as ‘possible’ locations for notebook entries.
This was a problem. We had hoped to use this data as the glue between the two main components of the application: the poverty map and survey notebooks. It would enable us to create links between notebook pages and locations on the map, and allow the user to move seamlessly between them. Our metadata team set to work to check the coordinates and correct them where necessary, using a simple web application we built to facilitate look-up and capture of coordinates from the Booth map. Their work provided the foundation for the new application, and its importance cannot be overstated.
Fortunately, other data sources from the Charles Booth Online Archive were easier to recycle. We reused the descriptions of notebook entries, as well as georeferenced lists of 1898 landmarks and parishes.
Reading the notebooks
The descriptions of notebook entries had, of course, originally been sourced from the notebooks themselves. In the Charles Booth Online Archive, these notebooks were displayed in a simple viewer, which allowed users to page back and forth through the volumes. The technology used to display digitised texts on the web has moved on since then, developing particularly rapidly in the last five years, and we wanted to take advantage of this.
We decided to use IIIF to present the notebooks. This would not only enable us to display them in a modern image viewer, but would also allow others to reuse our images. We used the osullivan Ruby gem to generate our IIIF manifests, following the IIIF Presentation API specification as closely as possible. We chose IIPImage as our image server, based on its performance and the ease of integration with Apache, but there are a number of other options. The images we had extracted from the existing site were converted to tiled pyramidal TIFFs, to support the dynamic image zooming and panning made possible by IIIF (JPEG2000 is another option, but we were put off by the need to use a proprietary decoding library). We opted to display the images using the Universal Viewer, which seemed to us to be the most full-featured of the viewers on offer, and the best-suited to displaying Booth’s handwritten survey notebooks (but again, other options exist). We were also able to display the summarised notebook entries alongside the handwritten notebook pages, as a reading aid.
Searching for a new way in
Whilst the ability to browse the map and notebooks would be an important feature of our new application, we also wanted to enable users to search this content. The Charles Booth Online Archive had provided a simple search facility for the notebook entries, powered by Zebra. For our application, however, we needed more advanced search features, including dynamic filtering of search results, and the ability to tune relevance scores. We chose Elasticsearch as our search engine, based on its ease of set-up, straightforward query DSL and REST API, and support for integration with Ruby on Rails (which we had already settled on as our web framework — more on that below).
We used Elasticsearch to power searches of both the notebooks and the map, the latter making use of the landmarks and parishes datasets mentioned above. We also wanted users to be able to search the map for present-day locations, such as street names and postcodes, for which we would need to use an external geocoding API. Several such services exist, most commercial with restrictions on data usage. We chose to use the Ordnance Survey Names API, a free service that provides searching and geocoding of UK place names. Combining the data from this API with our own datasets, we were able to provide searching of both present-day and historical locations directly from the map.
Pulling it all together
Throughout the development process, we worked with design and development agency Mickey & Mallory on designs for the new site. They produced a contemporary design, inspired by the Booth map, which was realised in responsive templates and stylesheets. Here at LSE, we developed the Rails back-end for the site and the interactive front-end components. We already maintain a Rails stack for the LSE Digital Library, so Rails was an obvious choice. We used Bootstrap to create templates for prototyping, which were eventually replaced with the final templates produced by Mickey & Mallory.
Final thoughts and future plans
We hope that Charles Booth’s London is a worthy successor to the Charles Booth Online Archive and PhoneBooth, and that it will make this historic poverty study more accessible to audiences old and new. Developing the site was an opportunity to experiment with new technologies, and as such was a valuable learning experience. We also think that it’s a great example of how open technologies and services can be combined to create a new product, and we want to make a point of registering our gratitude to those who work on these projects and have enabled us to build upon them.
This isn’t the end of the story, though. There’s more material from Booth’s survey that could potentially be digitised and added to the site, and we’re also looking at how we might further enhance the data behind the application and make it more openly available. 100 years after Booth’s death, we’re confident that we can continue to find new ways to make his research more accessible.
Visit Charles Booth’s London to view the map and notebooks from Booth’s poverty study, and to read more about his work.