The global epic of data distribution

Nicolas Terpolilli
9 min readSep 7, 2016

--

“Aeneas marvels at such things on Vulcan’s shield, his mother’s gift,

and delights in the images, not recognising the future events,

lifting to his shoulder the glory and the destiny of his heirs” Aeneid, VIII 729–731, Virgil

The extract of the Aeneid I quoted comes just after a description of Aeneas shield before going to war. On the shield are represented all the glorious and amazing events that will take place in Rome after he wins the war. That’s his destiny and he knows something great is going to happen. He just has to do it. I think Open Data is in this kind of crucial moment.

When I first came into Open Data a few years ago, there weren’t a lot of use cases. There weren’t a lot of portals. There weren’t those stories about dropping crime rates or community made smart transportation apps. There was a dynamic community, I’d even say it was possibly larger than today. But I didn’t know that community. I came into Open Data because of two personal beliefs.

  • I do not believe that data have an inherent “value” by themselves. I think that data have become a commodity. You can learn data science for free online, you can use tons of open sources tools on your laptop, you can rent calculation power for half a cent the minute online. On top of that “democratized data infrastructure” there are data to use everywhere. Open Data is part of that movement. But make no mistake, your data truly are a commodity. What is valuable is your community, your ecosystem, and the things built around the data that help you provide amazing services/product to your customers.
  • I think that the world would be a better place if data move easily between organizations, people and governments. This one is more or less personal conviction. I like the idea of a more open world, with more exchanges. On the theoretical side, the more access to decentralized and distributable data, the more robust the global system becomes, and the more likely amazing stuff is to happen.

I decided to work to make Open Data happen, not because it was giving early results — it’s getting better, but it is still not done. I truly believe in Open Data’s destiny and its ability to change the world for the better.

Cargo Cult Open Data

A cargo cult is a religious movement usually emerging in tribal or isolated societies after they have had an encounter with an external and technologically advanced society. Usually cargo cults focus on magical thinking and a variety of intricate rituals designed to obtain the material wealth of the advanced culture they encountered.

There have been a lot of disillusions in the Open Data world in the last few months. There have been a lot of articles, including that great piece by Giuseppe Sollazzo. A lot of people have started understanding that destiny may not be automatic and that the road to the future that Open Data promises still must be built. The first narrative usually simplified as such: “You’ll give people some data files online and they will come back with huge, totally positive, innovations changing the lives of your citizens for ever”. Just give a look at the “European Data Portal” narrative about Open Data benefits. That’s crazy. Who can believe that “in 2016, there will be 75,000 Open Data jobs within the EU 28+ private sector”??! Who can believe that “16% reduction in energy consumption” in an Open Data finger snap??! And where are the data backing that?

“So I call these things Cargo Cult Science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.” Richard P. Feynman

From cargo cult to robustness

What Giuseppe Sollazzo clearly points is that anybody in charge of an Open Data project has to (re-)focus on problems. When you release data, who are you trying to help? If a lot of Open Data projects look like Cargo Cult — in the sense that they are copying the look and feel, the wording, the narrative from successful Open Data portals but not necessarily the work on the data and the feature to make the data really usable — it may be due to a lack of understanding of the real problems.

That really basic mimetic way to create is, I guess, mostly due to a top-to-bottom approach to data opening. Because Open Data and modern data distribution still requires technical skills, the task is often given to tech people. That may be fine in a lot of cases. But I truly think the whole movement would benefit from tools allowing anyone to publish and use data. The more diversity there is among the people releasing the data, the closer to the problems we will be, and the more Open Data will deliver on its promises.

Open Data has always been about distribution and decentralization. It should also be that way in term of skills and people.

Framework for innovation dynamics

The Carlota Perez framework is, generally, a great way to understand the dynamics of innovation. I highly recommend her classic Technological Revolutions and Financial Capital: The Dynamics of Bubbles and Golden Ages. Basically, you can analyze the dynamics of innovation in 3 phases: Installation, Bubble/Turning Point and Deployment. Think Silicon Valley since 1971. The Installation phase includes the development of the personal computer and the foundation of public/military funded information infrastructures (Internet, GPS). Then comes the bubble excitement period around 1997–00. Nobody really knew what was going to happen but everybody had understood there was money to be made. Hence the bubble. The turning point happens when the bubble bursts. That’s a difficult time but the positive in that burst is that great and real companies emerge from that burst (think Amazon and Google). At that time we know what is the global recipe of that innovation cycle. Then comes the deployment phase where companies try to apply that recipe to every other sector (Airbnb, Uber, Tesla). It’s a framework, and Carlota Perez is sufficiently clever not to make it too constraining. But it helps to understand the dynamics.

On a smaller scale, Open Data and more generally data management, is following the same dynamics. I think we are living the Bubble/Turning Point phase. We’ve seen good and viable projects but we’ve also seen tons of failures. It’s time to learn from that. It’s time to theorize. It’s time to write and share about experiences. It’s time for data distribution and Open Data to build the path towards their promises. It’s time to fulfil its destiny!

Destiny must be incarnated

The extract of the Aeneid that I quoted in the introduction comes just after a description of Aeneas shield before going to war. On the shield are represented all the glorious and amazing events that will take place in Rome after he wins the war. Destiny is important, specially in Greco-Roman philosophy and literature. But Destiny is nothing if there is nobody to make things happen.

Peter Thiel CS183

Picture a quadrant (that I took from Peter Thiel’s class) with a determinate-indeterminate axis and an pessimistic-optimistic axis.

Most of Open Data projects we can observe are indeterminate-optimists. The projects lead by true Open Data believers but who have made poor design choices and don’t have a real plan about what data, when, and how.

Peter Thiel CS183

There are also a lot of indeterminate-pessimists, mostly those who have been forced by law to open some datasets, and decided that it was a waste of money and time.

What’s really great is that there are really amazing Open Data projects. Made and leaded by amazing people who have a plan, and execute it well. They are the determinate-optimists.

A project like the Archives of the Planet, made from the photo collection of Albert Kahn and released as Open Data by the French department Haut-de-Seine (sort of equivalent to a state or province) is a perfect example. They are putting in a huge amount of work to open that collection, aiming at 70,000 pictures released at the end of the year. And it’s an example of Open Data that is clear and accessible to the public: anyone can interact with an open dataset in the form of a photo archive.

This open dataset drives most of their traffic and fulfils all of their objectives. They are both optimists about Open Data and its promises and determinate in the ways to achieve them.

There are obviously some Open Data determinate-pessimists, usually they don’t even want to talk with us. That’s fine, that’s normal, they don’t share our views. I see them more as an opportunity than a threat for Open Data.

But everybody who cares a little about the future of Open Data and data distribution in general must really focus on indeterminate projects and indeterminate people. Those examples will be bad for everybody.

I’m perfectly fine with OpenDataSoft competitors, and in general with everybody in the Open Data community who does not agree with our vision and our product. A lot of people actually don’t agree with our vision. Some have different views on the API first approach, some really prefer Linked Data and RDF, and some just don’t like the idea that we are not fully open source. But that’s fine, that’s exactly why Open Data is improving: we are making determinate choices and people choose to follow or not.

What’s infuriates me is all those projects with no ideas and no real technical choices. I just can not understand how “let’s pay a bunch of consultants to implement a basic website full of links to indistinct ressources (yeah PDFs and HTML are still not Open Data) with no real consistent way of accessing the data, and let’s organize an hackathon the week of the portal opening and never plan anything again after that” can pop up in someone’s head and become an “Open Data strategy”. That feels like an old way to do things, like Big Blue all over again. That is just new and nice ideas — transparency, openness — implemented the old way. That is like making a windmill turn thanks to a hidden coal-fired power plant. It looks new and clean, but it’s not!

“It begins by rejecting the unjust tyranny of Chance. You are not a lottery ticket.” Zero To One, VI, Peter Thiel

I’m proud to work at OpenDataSoft, and I’m proud that we have a determinate-optimist view on Open Data:

  • The modern way to manage data is to give tools to the average person. And they don’t want to scrap html tables, look at a CSV file, nor do they want to learn SPARQL.
  • It implies an API first approach and it implies ready-to-use-online tools to analyze, visualize and map the data.
  • Openness in not a boolean, it is a spectrum. When a transportation company starts sharing data with a city administration, most of the time there is an opening dynamic and it ends up in Open Data. Hence we build tools to let anybody distribute data to the people they want, in the easiest way possible and with the most precise granularity possible.
  • Open Data is still new; there is no way it has found its definite form. That is why we develop new ways to distribute data, e.g. the Open Data as Terraces.
  • To be viable in the long run — think to have a more open and prosperous society in the long run — we need to build a network. More precisely two networks : a data network and a people network. And they cannot be fragile, they must be robust.
  • People responsible of Open Data portals need indicators and tools to clearly define their plan, become determinate and smartly execute it.

Aeneas ends up only preparing the ground for the Rome millennium. From Romulus to Augustus via Cesar, Cicero or Trajan ; the destiny is accomplished by a lot of different people. We are at a crucial moment in Open Data and more generally on every facets of modern data distribution. Let’s prepare an amazing ground so that generations of data users can solve more and more problems!

--

--