You can’t get there from here (part 2)

Sally Kerr
Jul 28, 2017 · 4 min read

In my last post, I talked about the need to take a holistic view of data. If we managed our data more effectively, open data delivery will be more efficient. This means seeing data that we create and manage as being part of a wider ecosystem that incorporates open data.

In the most basic sense this can mean understanding the data life cycle — creation, management, review, updating and, eventually, deletion. However, this needs to be expanded by adding a process that starts with either the creation of a data set, or if it exists already, use of a data set. This process can be quite straightforward, but it involves re-thinking the purpose of the data, which is not.

Currently data sets are created to answer questions, with the purpose of reporting, monitoring, informing and research. There is an end goal for each purpose, and often a single use. Whilst data sets may be combined, analysed and reports produced providing insight, there is still, quite commonly a single reason or purpose for which the data set was created.

What if that were to change? Standing back from the identified purposes, we would ask one question:

What is the use scope of this data set?

The answer to this would impact on the entire lifecycle of that data set.

Here is an example: Edinburgh has sensors around the city to count the number of cyclists travelling around the city. This data is used, most commonly, to illustrate the growth in cycling in the city, but also to track patterns of use and inform thinking about cycle routes. The data is collected by another organisation and shared with the Council.

Let’s go back to the beginning, when the sensors were set up and the parameters for collecting data were agreed. At that point the question would have been asked — what is the use scope for this data? At the time the answers would be as outlined above, possibly with one or two additions.

What has happened since then? The data has continued to be collated, and shared with the Council. University students have requested use of the data for their projects, as well as the University of Edinburgh carrying out a study of the data to provide more insight, and the data has been made available as open data for use in hackathons and other data events.

Crucially the data set fields and titles have not been changed to reflect these new uses — this is because the parameters will have been set up at the start and changing them is not always easy. This means that unless you understand the sensor set up, the method of collection, and the variables around this, the data could be difficult to navigate. If multiple uses — and crucially open data- had been considered at the start - it’s likely this would have changed the parameters, and made it easier for the Council team to manage the data usage( I should say, the team are very keen to share the data and interested in use cases).

This will seem a very basic point to illustrate. However, this dataset will not just be used by the relevant team, but by other teams who may be collating strategic reports on active travel, transport in the city, and the environment. The data set may be used to help identify people flow in the city, and help Edinburgh better manage its overall infrastructure. As the Council is investing in expanding cycle routes and promoting cycling as an activity, the data will continue to attract a wider audience. Over time, it may become a baseline marking Edinburgh’s development as a Smart City.

This data set has an evolutionary life cycle. Its use is likely to change.

The questions to ask before creating the data set have become more complex:

· What other data can be gathered from the cycle sensors
· Are there other sensors in operation that could be used to enrich the existing data set
· Who is likely to work with this data set and should be in the use scope
·What other data sets should be considered with this data set to form a cluster

But the opportunities to benefit from this dataset have just expanded. By creating a process that addresses future use as well as identified use the data ecosystem has become of greater value. The data set has become an asset that must be managed within this ecosystem.

The ecosystem now looks different from the simpler model above, and may apply to a cluster not just one data set:

  1. Develop the use scope

2. Create (All field elements, description, and other meta data)

3. Publish (More than one individual, publish as shared or open data)

4. Manage (the use scope; the data will have all requirements in place)

5. Review (Collate use cases, data requests or queries, new data sets identified)

6. Update (Revisit the use scope and review findings, amend, and track changes)

Imagine applying this to the datasets across an organisation — to the core datasets that provide insight and inform decision making, to the geo data sets and asset datasets. That organisation would be establishing a data ecosystem extending beyond its own data ownership to include the city and its different neighbourhoods and different sectors such as health and tourism. This infrastructure is a data-lattice enabling visualisations, predictive analytics, community engagement and much more.

It is a challenge to start thinking of data in this way, and, in the public sector, find the time to apply this process. It should be approached as a long-term investment. Beginning with business areas that are generating data requests already, using FOI, MI and known data clusters, as well as the core data sets used across the business. Creating this process for data-driven projects at the outset will also ensure the richest return on the data.

Sally Kerr

Written by

Digital, data & innovation, arts, communicator, founder of @EdinburghApps, co-founder @EdiLivingLab. French Horn playing for fun. Posts my own.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade