What’s wrong with open infrastructure for Remote Sensing geodata?

Arjen Vrielink
5 min readSep 10, 2018

--

The world has discovered geodata. Finally. More satellite data is becoming publicly available from government programs like NASA’s Landsat or EU’s Copernicus, implemented by ESA and others. At the same time more and more commercial companies manage their own constellation of earth observation satellites. The NASA and ESA programmes are especially promising because they provide open data: their data is free for anyone to access and use.

Together with readily available open source software tools and open standards this allows for unprecedented possibilities.

And this open data and software and knowledge is ‘just there’, in the ‘cloud’. Everybody seems to assume that once you make data or software ‘open’, some magic happens and it becomes available for people to use it. This is not true. The cloud doesn’t exist, there’s just other people’s computers. Someone owns those computers, owns this infrastructure and spends money to keep it online.

RemotePixel, a satellite metadata search portal, gives us a peek behind the scenes, and even more than a peek:

Dear friends, I’m sorry but I have shut down remotepixel for a few days.
Recent change in Sentinel-2 AWS bucket resulted in a huge increase in my AWS bill (up to 3500 $).

This blog focuses on open (satellite) geodata.

Open

First, the addition of ‘open’ to a concept is generally received with a favourable connotation: open future, open city, open culture, open software, open data, open access, open door policy.

So that would be a starting point: open is good. On the other hand, there is a group of people who confuse ‘open’ with ‘free’, especially in the open source software context. There are also people who have negative associations to open source software where that known for bug ridden programs written by good willing amateurs with an inferiority complex looking for their fifteen minutes of fame.

For the sake of discussion, let’s go for the positive interpretation, summarised well by the Open Knowledge Foundation as:

A world where knowledge creates power for the many, not the few (okfn)

The idea is: if all knowledge (data, software, information) would be open to all, there would be no privileged minority who could exploit this power.

That’s great, but to get there we need:

  • Open standards (for easy and structured exchange of information).
  • Open source software (tools that implement those standards).
  • Open data (the ‘fuel’ of knowledge).
open window by Paweł Czerwiński on Unsplash

Open standards

(Technical) standards are there to make sure stuff can work together, even if it’s made independently. Standards can be open or closed (eg the way the CIA encrypts messages). And standards can originate in meeting rooms of sluggish bodies or emerge as de facto standards.

Examples of open meeting room standards in the geodata world are the OGC standards: WMS, WFS, CSW or the EU based INSPIRE metadata standard. Examples of open emerging industry standards are eg Cloud Optimised Geotiff or SPAT all driven by Radiant.Earth.

(Open) standards are usually available on the internet as a document and require hardly any infrastructure investment.

Open source software

There is a lot of confusion and misunderstanding surrounding open source software. This is mainly caused by the amount of different open licenses available. Most commonly people associate open source software with ‘free as in beer’ software, whereas the true spirit behind open source software is ‘free as in speech’.

Open source software is usually available on online software collaboration platforms like Github or Gitlab. Those platforms provide infrastructure for hosting and collaboration on source code and require considerable investments in infrastructure. Those platforms usually have a premium business model; costs of the platform is covered by a small number of paying customers.

Open data

Open standards make sure there is one way to read, write and exchange the data. With open source tools you can use or consume the data: extract, load, transform, query and add value. Without data however, standards and tools are meaningless. Open data can be seen as the fuel of knowledge (or information, insights, answers, pick one you like).

As with open source software, the license associated with the data, determining its terms of use, is key. The best known open license is the Creative Commons license. But like in open source software there are many kinds of licenses for open data; each with their own restrictions.

Open data platforms are usually hosted and funded by foundations or large institutions. Wikipedia is backed by a foundation that depends on gifts. The World Bank finances and hosts their own open data platform like space agencies much like ESA or NASA space agencies.

Thoughts on open infrastructure

We now have an idea about how ‘open’ works; we need tools to work with available data and standards so we can exchange information efficiently. And we learned that there should be a robust (open?) infrastructure guaranteeing data availability in the long term.

This gives us a framework to think about openness as a system; there is no such thing as open infrastructure as in open standards, open software or open data. Open infrastructure is a crucial part of an open system. A tentative conclusion might be that open infrastructure is a robust infrastructure where robust means: accessible and guaranteed for the long term.

For open standards and open software we saw that financing open infrastructure is either no problem (open standards) or commercially viable (open software). The ‘problem’ with open data, however, is that with the ever growing amount of data the cost of hosting the data are equally increasing. For open standards hosting costs can be neglected and for open source software there is a proven business model covering the hosting costs. For open data, however, this is much more problematic.

There are three reasons for that:

  1. Nobody wants to pay for open data directly.
  2. It’s much harder to think of a meaningful set of tools or services around data for which people want to pay, like for open source software.
  3. Open data can only have impact if it’s accessible long term, not only today or tomorrow. That means considerable funds have to be secured for 5–10 year horizons.

That means that there should be non-commercial financing mechanisms to cover the infrastructure costs for open data. And so there are, like the ones discussed above (Wikipedia, World Bank and ESA and NASA).

Conclusion: there is a lot of confusion and misunderstanding about openness while actually there is no such thing as open data, open standards or open software which has meaning in itself. There are only open information systems. If one or more parts of an information system are open, that doesn’t mean that the whole system is open. Even if all the parts of the system are open but there is no viable business model.

Remember RemotePixel: a nice guy sets up a nice metadata service based on open standards, open software and open data for easy discovery of satellite images. It gets popular, cost increase exponentially but there is no corresponding revenue stream. Service goes down.

My intuition says that the role of funding big open data platforms should not be controlled by the private sector but rather by government agencies, NGOs or foundations.

ESA’s Copernicus program does this really well. Though I think they overreach a bit with their Data and Information Access Services (DIAS), where they try to be another Google Earth Engine. What’s wrong with the DIAS program and why Google Earth Engine is just another consultants’ laptop in the cloud is a topic for another blog.

Note: remotepixel.ca was back up when this post was published

--

--