Image for post
Image for post
NOARCHIVE: http://sfbay.craigslist.org/sfc/apa/4139480380.html

A Short Proposal for Robots.json

Machine-readable terms of service for APIs

Tyler Singletary
Oct 17, 2013 · 5 min read

A Version of the Story So Far

Disclaimer: I am not a lawyer. I’m not even an armchair lawyer. I’m not connected with 3taps, Craigslist, or PadMapper, and the facts here are greatly simplified, and, at times, probably incorrect. It’s an illustration, not a legal quote.


Why scrape?

Proponents of scraping cite a number of reasons for the practice, chiefly that, in lieu of an API being available, it is the only way to gather valuable data. Another argument calls back to web standards and open data initiatives— essentially, if a web browser can read this information, and you’ve made it available to be downloaded by a human using a browser, how is it any different if it was automated?


Why not?

The arguments get more complicated when you consider licensing rights to the content on the sites. Is it user generated? Does the user own that content, or was it assigned to the site? Is the information public, governmental data, or is it proprietary? Did the site license it from some other content holder? What’s the difference between ‘facts’ and ‘creative content’? These are all philosophical and legal issues to consider, but are well outside what I can cover here.

Robots.txt

The architects of the modern internet foresaw some of these problems, especially in regards to search engines. There had to be a way to tell programatic collectors, hey, this part isn’t for you. Ignore it. Don’t index it, don’t cache it, don’t analyze it. This is what’s know as Robots.txt, a file present in your web root describing URIs to be ignored. Interestingly, the concept itself was much broader than that, as it should be, but Robots.txt is really a gentleperson’s agreement: I tell you what I don’t want you to do, and you respect it. Really, this is what most any terms of service is— it just has some teeth in that it can be programmatically obeyed.


Robots.json

In an ideal world, though, Robots.txt would be expanded to include, really, all of the terms of service for a product that could (and should) be respected programmatically. Right now we do this with cache headers, no follow links, robots.txt, and other such methods. But what if we had a place that both web applications and APIs could express their data license in an easy-to-consume and execute on system?

Back to Craigslist

If Craigslist had such a file, they could have easily expressed that, while their data is available to search engines for indexing, it is not available to be redistributed by a data broker. It would clearly lay out the chain of custody for the content, perhaps allowing PadMapper to not have required 3taps. Perhaps 3taps could have easily ‘passed-through’ Craigslist’s terms of service by requiring their customers to respect the same terms.

Politics of APIs

Discussions of Platform and the surrounding universe of API…

Thanks to Sean McCracken

Tyler Singletary

Written by

COO at Tagboard, formerly at Lithium & Klout. I’m on the Big Boulder Initiative board. Social data this and social data that. APIs and stuff.

Politics of APIs

Discussions of Platform and the surrounding universe of API Strategy

Tyler Singletary

Written by

COO at Tagboard, formerly at Lithium & Klout. I’m on the Big Boulder Initiative board. Social data this and social data that. APIs and stuff.

Politics of APIs

Discussions of Platform and the surrounding universe of API Strategy

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store