How free open government data can coexist with fee-based access

[I know this is going to irritate a number of my friends and colleagues. I look forward to the spirited debates. Sorry in advance.]

Former Washington D.C. Mayor Anthony Williams recently revisited a common topic of conversation in the open data movement: “Maybe Government Data Shouldn’t Always Be Free.” (The subsequent Fenty administration, which in late 2007 arguably jump-started the modern open data movement in the USA, built upon work done in 2006 under Williams.)

Cities could design access in a variety of ways. You could charge based on volume, so only high-volume users pay anything. Or you might give everyone a month of free access, or waive fees for NGOs, press, or individual citizens. Or maybe there could be a quid pro quo where companies like Uber do a data exchange with governments.

Many state and local governments, with many more around the globe, do charge for bulk data access, but Mayor Williams suggests some paths to get some of that as open data. Let’s dive into those hybrid free/fee approaches a little bit deeper, and discuss a number of additional ways.

  • Metered consumption. In this model, a certain amount of data is free, and when you need more you must pay for it. Within this model there are a couple of options: one where rates are measured on a per-dataset basis, and one where rates are measured site-wide, regardless of the dataset(s) being accessed. Although they don’t use it for payments, one of the major open data platforms already has this capability built in, and it’s likely others do as well. Need an example of a private sector company doing this? Look no further than Google Maps.
  • Consumer classification. Charging different rates for various types of customers is a well-known business model. For government data, business consumers are most likely to be charged, since they are probably using the data for revenue-generating purposes. Similar to other freemium models, the business consumers are paying customers that effectively subsidize free services for non-business customers.
  • Data exchange. Described by Mayor Williams as “quid pro quo”, the government supplies data to a third party in exchange for the ability to consume some other data from them. The Waze Connected Citizens program is a better example of this than Uber, with the added benefit that some (but certainly not all) governments make publicly available the same data which they share with Waze.
  • Access methods. Purely from a technology perspective (computer memory, processing, and connectivity), downloading an entire set of data is generally less costly than asking for small specific pieces of it. Bulk data could be made available for free while application programming interfaces (APIs) could be fee-based. Charging to push data to a subscriber as it becomes available (rather than pulling it using bulk downloads or APIs) is another option.
  • Premium datasets. Similar to consumer classification, except instead of charging based upon who is accessing the data, costs are determined by what data they access. For example, data about crimes might be free, but a data about licensed business owners might not be.
  • Premium data columns. A more refined version of premium datasets, this approach allows a set of records to be freely available to anyone, but provides more record details to those who are willing to pay. For example, a free version of real property sales data might contain an address, property type, boundaries, assessed value, and transfer date, but the premium version might contain buyer/seller details, sale amount, taxes paid, lien information, and so on.
  • Raw or processed data. This isn’t about summaries, statistics, or visuals. Instead, it is the distinction between a deluge of data from large numbers of sensors and the observations that may result. For example, electric meters continuously measure the rate of power consumption, but that data might be processed to create new records reflecting daily usage, or simply events like an unusual change in consumption. This raw data generally requires more investment to process, so it may be reasonable to expect consumers to pay for access to it as part of that investment. (On the flip side, costs for processing raw data and charging for the results might also be an option.)
  • Realtime or delayed access. This scenario is useful for transactional data where extremely recent data has greater value, but value decreases over time until it’s free. A great example from the finance sector is stock market data, where companies pay a premium for realtime information (and invest millions of dollars to gain a millisecond lead over the competition), but within 15–20 minutes the data is free and publicly available. Government data such as service requests, permits, violations, and real property transfers are likely good candidates for this approach, though perhaps the timeframes are in days or hours, not minutes.
  • Minimal fees. Following the model of public transit, charge all data consumers a small fee for use. Public transit fares don’t usually cover the full cost of operating the service; the rest is subsidized.
  • Voluntary contributions. There are an number of online platforms which invite visitors to donate data, but very few publish data and invite donations to support its continued availability. Donations to government aren’t unusual. Many open source software projects and associated organizations invite donations.
  • Publicly-operated data markets. This is an extremely interesting approach, because it provides a few other benefits beyond making government data accessible. With this approach, a government offers a public data market as a platform on which it and third-parties make a variety of data available, some for free and some at a premium. Because they are operating it, the government gains the ability to apply taxes or fees to data-access transactions (and this could be through any or all of the models suggested above), but it also gets an opportunity to regulate the market itself by establishing ground rules to protect privacy, public interest, and so on. Smart Copenhagen appears to be moving in this direction, and Smart Dubai may evolve towards this as well. (These platforms also present the opportunity for revenue generation through advertising, even if it’s just advertising other datasets to their repeat customers.)

The list above isn’t exhaustive, and some models can be combined together to create a vast array of options. Each approach also has a variety of pros and cons associated which are better explored in future discussion. However, one big issue which is consistent for many of these approaches is the cost of implementing a metering and payment mechanism — not just to handle a consumer’s financial transaction, but also the accounting, distribution, and auditing of funds — has to be considered.

Perhaps at core of this conversation is this: open data programs can be operated at very low costs compared to other government services, but the cost is never zero. Setting up public data programs with payment components may be just one of several ways to ensure sustainability, especially in times of austerity. The case can be made that charging for the data will also force quality and documentation to significantly improve as well. The models above, if applied with careful thought and open debate, can support those possibilities without shutting out hobbyists, reporters, academics, advocates, non-profits, communities, or other consumers within government.