Warranty.

Martin Howitt
The Data Place
Published in
4 min readJul 10, 2017

One of the tenets of the philosophy behind The Data Place is the belief that open data stores should be more of a commodity: that is, we want to lower the barriers to people having one (or more than one). The Data Place is built using open-source software, but more things than code are required to make a functioning service.

In old-fashioned enterprise IT these things — such as support, security, availability, resilience, and performance — are known as warranty: the things that enable you to be able to rely on the service. More modern IT systems — that may use cloud technologies, DevOps deployment styles, and agile development — still need warranty, but it tends to be implemented differently. This post is about how we are approaching these things, and is a tiny bit technical because these are the things that will likely interest IT departments of organisations that are implementing open data portals.

Our technical stack

At The Data Place we use the open source CKAN software for our core datastore/catalogue product. This in turn uses a stack based on Nginx, Apache (with WSGI), Solr, Postfix and PostgreSQL running on Ubuntu servers.

Our front-end products are being developed using Wagtail, partly because it runs on a very similar stack to CKAN (and this will keep our support costs down as we scale out) and partly because, by all accounts, it’s pretty good.

All of these are mature, open source components with lots of deployment history elsewhere on the internet and with active support communities. This is important because it keeps our support costs down but as a responsible community member we will contribute back to all these products where we can.

Availability

We get our servers from the cloud (we are currently using Digital Ocean) because, well, it’s 2017, right? We want to be able to provision data infrastructure to customers instantly and it doesn’t make sense to do an on-premise installation. This also means we can use cloud tools such as snap-shots and mirroring, so should the primary data centre fail we can restore the service in a different one; we can also restore previous versions of a service as needed. When thinking about availability we are, of course, only as strong as our weakest link: this means we have to assess cloud providers’ reliability track records.

Performance

Another benefit of cloud services is the ability to scale them up (and down again) as needed. We’re still getting to grips with this as we scale out the infrastructure we are using, but adding more capacity to servers is nice and easy as with most cloud providers. We’re going to talk more about our technical architecture in the future.

Security

Security is important and even though our primary product is about open data, we’re still a target. For example, people are using the data we hold to provide applications or other services, we will be taking payments online, we hold usernames and email addresses, and we’re running CRM and mailing lists.

To begin to tackle security we’re taking some short and long-term steps:

  • we always run the most up to date versions of the open source software we use
  • we are gradually working on moving the various components of our “stack” to different servers so they can be updated or changed with the minimum of disruption
  • we are looking into modern devops methods such as containerisation
  • we will be running some basic security tests ourselves using open source toolkits such as Metasploit
  • when we are satisfied the basic infrastructure design is stable we will be engaging a 3rd party CHECK provider to test our infrastructure, help close any remaining loopholes, and provide information back to the open-source communities we are part of.
  • we are working towards the ISO 27001 information security standard: this is a fully-featured discipline that will ensure we manage information security risk in every aspect of our operations.

Privacy

Aligned to the security agenda is the issue of data protection. This means that when we process people’s personal information — for example when we take a payment online, log details of meetings in our CRM, or manage someone’s sign-up to one of our mailing lists — that we are able to trace what we have done and that we don’t use that data inappropriately.

We want to be as transparent as possible in how we do this and the purposes for which we use personal data, and we also need to be on the right side of the law in this respect, so we deliberately choose hosting that is based in the UK and we will also maintain records of data flows through our services.

Opening up our warranty information

This short blog post is the beginning of the process of us being as open as possible about our operations. Ultimately we are going to publish things like our uptime statistics, some data flows, ISO 27001 status and performance statistics as open data sets because we believe in being open and we think there’s public value in it: so watch this space.

--

--

Martin Howitt
The Data Place

Municipal. Co-founder/technology lead at @thedataplace @odidevon @thingscamp.